



Jttiaca, New $nrk 



BOUGHT WITH THE INCOME OF THE 

SAGE ENDOWMENt FUND 

THE GIFT OF 

HENRY W. SAGE 

1891 

ENGINEERING LIBRARY 



Date Due 



i\iuv 



D l C^l 1959 



DECl 



4 ^959 



jEB-..l.Jiiia^ 



jia^ 



m/ zi m^c^ 



NO V, 5 



196^ 



'^*m««i^dwiMMM 



T^Ht 



ftinii' 7 



t^iSit- 



-££ 



' • iiV- 



/> 196h 



Q0T2! i 1967 



yunJ 2^ 'Z^ 



PRINTED IN 



(**f 



NO. Z3233 



Cornell University Library 
QA 273.F52 
"Vhe mathematical theory of probabilities 






3 1924 004 250 456 




Cornell University 
Library 



The original of tiiis book is in 
tine Cornell University Library. 

There are no known copyright restrictions in 
the United States on the use of the text. 



http://www.archive.org/details/cu31924004250456 



THE MATHEMATICAL THEORY OF 
PROBABILITIES 



_^r^^ 



THE MACMILLAN COMPANY 

NEW YORK BOSTON CHICAGO 

DALLAS SAN FRANCISCO 

MACMILLAN & CO., Limited 

LONDON BOMBAY CALCUTTA 
MELBOURNE 

THE MACMILLAN CO. OF CANADA. Ltd. 

TORONTO 



THE MATHEMATICAL THEORY 

OF 

PROBABILITIES 

AND ITS APPLICATION TO 

FREQUENCY CURVES AND STATISTICAL 

METHODS 

BY 

ARNE FISHER. 

TRANSLATED PROM THE DANISH 
BY 

CHARLOTTE DICKSON, B.A. 

(COLUMBIA) 

MATHEMATICAL ASSISTANT IN THE DEPARTMENT OF DEVELOPMENT 
AND RESEARCH OF 
THE AMERICAN TELEPHONE AND TELEGRAPH COMPANY 

AND 

WILLIAM BONYNGE, B.A. 

(BELFAST) 

WITH INTRODUCTORY NOTES 
BY 

M. C. RORTY 

AND 

F. W. FRANKLAND, F.I.A., F.A.S, F.S.S. 

VOLUME I 

Mathematical Probabilities, Frequency Curves, Homograde and 
Heterograde Statistics 



SECOND EDITION GREATLY ENLARGED 



NEW YORK 

THE MACMILLAN COMPANY 

1922 






Copyright, 1915 and 1922, 
By ARNE fisher 



Set up and electrotyped. Published November, 1915. 
Second Edition, greatly enlarged, May, 1922. 






FEINTED IN TUB UNITED STATES OF AMBEICA 



x'/^ 



INTRODUCTORY NOTE TO THE SECOND EDITION. 

Mr. Fisher has requested that an introduction be written to 
this, the second edition of his work on probabilities, which shall 
indicate some of the practical applications of the mathematical 
theory with which his treatise deals. 

The writer has only a limited knowledge of mathematical 
technique — ^yet it has so happened that in twenty-five years of 
active work as engineer, statistician and executive he has had 
frequent occasion to call upon the skill of trained mathematicians 
for the solution of practical problems involving frequency curves 
and probabilities. Among such mathematicians none has been 
more helpful, or quicker to perceive the possibility of making 
valuable applications of higher mathematics to business problems', 
than Mr. Fisher himself. For this reason it is a duty as well as a 
privilege to outline, at his request, certain actual practical expe- 
riences with mathematical applications and to indicate such 
possible applications for the future. 

The writer's initial experience with frequency curves and 
probabilities was in the years 1902 and 1903, when it became 
evident, in analyzing various problems in telephone traffic, that 
certain- peak loads, which were superimposed upon the normal 
seasonal, weekly, and daily fluctuations, could be accounted for 
only by the laws of chance. Recourse was, therefore, had to the 
formulae then available for approximate summations of the terms 
of the binomial expansion, and from these a series of curves was 
drawn which indicated for any given normal hourly traffic (as 
indicated by studies of seasonal, weekly, and daily variations) the 
probability that any given short period load would be equalled or 
exceeded.. Practical experience with these curves soon showed that, 
in spite of minor errors, they were close enough to the real facts 
to make them of primary importance in traffic studies of all kinds, 
and particularly in the development of mechanical switching de- 
vices. Their use for such purposes has now become a common- 
place in telephone engineering. 

As a by-product of the preceding application there have been 
other interesting uses of the same probability curves. Effective 
studies have been made of the decrease in the total stocks of small 
machine parts that could be made possible by standardizing and 
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reducing the number of types of screws, bolts, nuts, etc. The 
curves can also be applied directly to every line of business and 
every type of operation where prompt service must be given and 
where the demand arises from a large number of independent 
sources, and is, therefore, subject to peak loads determined by the 
laws of chance, which may be superimposed upon other "normal" 
peak loads varying with the days of the week, the hours of the 
day, etc. 

Entirely separate applications of frequency curves are those 
necessary in actuarial work. These are relatively well known. 
But it is less generally known that one of the most important of 
business problems, that of depreciation, can be treated effectively 
only when approached on an actuarial basis with a full under- 
standing of the frequency curves which go^•ern the displacement, 
year by year, of the physical units involved. 

A still further use of frequency curves and the theory of probabil- 
ities, which is of immediate practical importance, is in connection 
with sampling operations. The theory of sampling has already 
been well developed, but adequate efforts have not yet been made 
by mathematicians to reduce the processes of sampling to de- 
pendable simple rules that can be applied by business executives 
and statisticians untrained in higher mathematics. In census 
work, and in statistical and other reports made by business or- 
ganizations, the waste of money, that could be avoided by an 
inteUigent application of the theory of sampling, is very great. 
Not only can many reports and analyses be made much more 
cheaply and quickly by sampling processes, but they can also be 
made more accurately. Many important items of information can" 
be determined only by trained specialists. In such cases the only 
procedure, that does not involve prohibitive expense in large census 
operations, is to tie such items, by a sampling process, to other 
items which are susceptible of exact enumeration Ijy lelatively 
miskilled enumerators, and then to compute the totals for the 
special items from the relations of such items to the items which 
are completely enumerated. 

All of the preceding are in the field of immediate practicalities. 
When we come to the future, one of the most pro:iiLiag uses of 
mathematics is in the development of logical processes. It is not 
going too far to say that all business, and most engineering opera- 
tions are fundamentally based on probabilities. The business man 
is always dealing in degrees of uncertainty, and even the engineer 
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has only occasionally a definite set of conditions upon which to 
base his computations. Where the problem is primarily a financial 
one, he must balance the cost of overbuilding against the cost of 
underbuilding; and, if he combines business judgment with en- 
gineering skill, he will multiply the amount of each possible loss 
by the probability of its occurring, and will ordinarily choose, 
among all possible plans, the plan which involves the minimum 
probable loss. Here it is not inappropriate to interject the idea 
that the most practical logic must always be in terms of probabil- 
ities, and that a logic which deals, or pretends to deal, in certainties 
only is not alone useless, but is also harmful and misleading, when 
difficult problems are to be approached. Such problems can 
rarely, if ever, be solved except through the cumulation toward a 
certainty of many small probabilities established from uncorre- 
lated, or only partially correlated, viewpoints. 

A final suggestion which is to-day speculative, but may assume 
important practical aspects in the near future, is with respect to 
the applications of frequency curves and probabilities to physical 
and cosmic mathematics. In such mathematics we are forced to 
assume that all of our measures must arise out of the things meas- 
ured. When we deal with physical velocities, it would seem that 
our only measures of velocity can arise out of the velocities them- 
selves. Similar considerations hold true with respect to funda- 
mental measures of physical extension. Under these circum- 
stances we may talk in terms of infinite space and of infinite time, 
but we can hardly talk in terms of infinites when we are dealing 
with the dimensions of atonuc structure and the velocities of 
material particles. In these cases it seems very highly probable 
that we are dealing with frequency distributions which we must 
measure and define in terms derived from such distributions them- 
seh'es. "\Mth respect to such measures some of our frequency 
curves may have infinite "tails," but it is more probable that the 
frequency forms are such that they can be completely defined in 
finite terms. Along this same line, we may even risk a closing 
speculation that the relative proportions of organized matter and 
space in the stellar universe are determined through the opera- 
tions of the laws of chance in establishing heterogeneities in what 
is otherwise a homogeneous void-filling medium. 

M. C. RORTY. 

New York City, 
March 2, 1922. 
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At the time when the first edition of this little book was published 
in 1916, I expected to issue a second volume shortly after, dealing 
with frequency curves and frequency surfaces as well as the re- 
lated problem of co-variation (correlation). The manuscript for 
this volume was completed and printing had already commenced 
on some of the chapters, when a series of misfortunes, not neces- 
sarily imexpected, overtook the work. A major part of the manu- 
script while in transit to a friend in Denmark for review and cor- 
rections went down with a Danish vessel when torpedoed by an 
outlaw German submarine. A duplicate copy was for some reason 
or other withheld by the British military censor and not returned 
to the writer until long after the termination of the world war. 
My third and final copy of the manuscript, which I had submitted 
to an American friend for critical review was also lost in transit. 
The veritable nemesis which seems to have followed my efforts is, 
however, only a verification of the all prevailing laws of chance, 
which every serious minded student must face with unperturbed 
attitude. In fact, the above misfortunes have, after all, only 
made me more determined to complete another collection of notes, 
which I eventually hope to put into proper shape for publication. 

In the meantime the first edition has been out of print for more 
than two years, and when the publisher asked me to prepare a 
new edition I took advantage of this opportunity to add several 
chapters on frequency fimctions and their application to het- 
erograde statistical series so as to give a complete treatment of 
statistical functions involving one variable. The book is, there- 
fore, twice its original size and contains the major part of what I 
originally intended for a second volume. 

The reader will readily notice that my treatment of the subject 
is based throughout upon the principles of the classical probability 
theory as founded by Bernoulli, De Moivre and above all by the 
great Laplace and his disciple, Poisson. I am of the opinion that 
these principles and their further extension by the Scandinavian 
statisticians and actuaries, Gram, Thiele, Westergaard, Charlier, 
Wicksell and Jorgensen, offer as yet the best and also the most 
powerful tools for the treatment of collected statistical data by 
means of mathematical methods. In the way of adumbration and 
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economy of thought the Laplacean methods stand unsurpassed 
in the whole realm of mathematical statistics. I have, therefore, 
in this volume limited my investigations to a systematic treat- 
ment along these lines. I hope, however, in the forthcoming 
second volume to treat the methods of Pearson, Edgeworth, 
Kaptejm, Bachelier and Knibbs and show their relation to La- 
place's theory. 

The reason why the Laplacean doctrine of frequency curves has 
been ignored until comparativel3' recent years and has remained 
more or less obscure is perhaps due to the fact that for more than 
a century it remained a theory pure and simple and was used but 
sparingly in practical calculations. 

Any statistical theory, in order to be of use in practical work, 
must be arranged in such a manner that it is readily adaptable to 
numerical computations. Advanced mathematical computation 
has not been given its due reward and proper attention in our 
ordinary academic instruction. A high grade mathematical 
computer i--. indeed a "rare bird," much more so in fact than a 
good mathematician. To arrange and plan the numerical work 
in connection with the theoretical formula so that the detailed and 
painstaking work is reduced to a minimum, and at the same time 
afford tlie proper means for checking and counterchecking, is by 
no means an easy task and often requires as much ingenuity as 
the actual development of the theoretical formulje. While Gauss 
has always been acknowledged as one of the world's greatest com- 
puters and in addition to his extensive work in pure mathematics 
also did much practical work in surveying, physics, and in financial 
and actuarial investigations, Laplace during his entire career 
remained a pure mathematician and apparently failed to grasp the 
paramount attributes required by a successful computer. His 
attempt to inject himself into public life, as for instance when he 
secured for himself an appointment as minister of the interior, 
must be regarded as a dismal failure as admitted in Napoleon's 
memorandum on his dismissal. 

The failure of Laplace to recognize fully the all-important phase 
of numerical computations in all observations on statistical mass 
phenomena is in my opinion the main reason why the Gaussian 
theory of observations and the allied subject on the theory of least 
squares has hitherto supplanted the admittedly superior theory 
of the great Frenchman. Gauss in addition to his theory furnished 
an essentially useful and elegant method for performing the neces- 
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sary numerical calculations, while Laplace left this decidedly 
important aspect out of consideration altogether. It remained in 
reality to Charlier to furnish the Laplacean doctrine with a prac- 
tical method for computing the various statistical parameters. 
And in the meantime the Gaussian methods reigned supreme 
whUe Laplace's great work was neglected. 

The careful reader will readily notice that in the treatment of 
frequency curves I have allowed the semi-invariants, originally 
introduced in the theory of statistics by Thiele, to occupy a central 
position. In my opinion the semi-invariants represent a more 
powerful tool than the method of moments. I have also tried to 
rescue from oblivion the important and original memoir by the 
Danish actuary, Gram, and give to him and the French math- 
ematician, Hermite, their due recognition as the earliest investi- 
gators of skew frequency distributions. Gram was perhaps the 
first investigator to make proper use of the orthogonal functional 
properties of the Laplacean normal frequency curve and its deriva- 
tives. By means of an application of the orthogonal properties of 
the Hermite polynomials and their close relation to the theory of 
integral equations, the whole theory of frequenc}^ distribution 
can be presented in a decidedly compact form; and I deem no 
apology necessary for ha\-ing introduced in my treatment of 
frequency curves some of the more elementary theorems of integral 
equations, that youngest branch of higher analysis, which at 
present occupies a central position in advanced mathematics. 

The most recent investigations along those lines have been 
made by the Swedish astronomer, Charlier, and his disciples, 
Jorgensen and Wicksell. Unfortunately these investigations have 
hitherto not received adequate and systematic treatment in Eng- 
lish and American texts on statistics, and it is my hope that 
the following pages may be of service in opening the eyes of 
English speaking statisticians to the practical utility of these 
methods. 

The examples have all been selected so as to give a complete and 
detailed illustration of the application of the theory to es;-;entially 
practical problems. I have, on the other hand, purposely refrained 
from giving the customary exercises, so-called, usually found in 
statistical texts, especially those in German and English. 

Although I have been a close student of and have read most of 
the published statistical text-books in about seven languages for 
the last ten years, I regret to state that I have found little or no 
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practical value in such trick exercises, which as a rule have but 
slight relation to problems occurring in daily life. 

Since the appearance of the first edition of this book in 1916 
a number of excellent statistical texts have been issued. Among 
these I may mention a new edition of Yule's well-known ele- 
mentary text, a greatly enlarged edition of Bowley's Elements of 
Statistics, the new treatise by Caradog Jones, an enlarged German 
translation of Charlier's Grunddragen, a very lucid Swedish text by 
Wicksell, the scholarly and broadly planned Statistikens Teori i 
Grundrids (in Danish) by Westergaard, and last but not least, the 
thesis by Jorgensen, Frekvensflader og Korrelation} 

Although an extended residence in the United States has per- 
haps improved my barbaric Dano-English, I fear that I must still 
apologize to the reader for my shortcomings in rhetoric and gram- 
mar. Most of the serious defects have, I hope, been overcome by 
the diligent efforts of my co-editor and translator. Miss C. Dickson, 
mathematical assistant in the department of Development and 
Research of the American Telephone and Telegraph Company. 
Miss Dickson's work has indeed been much beyond that of mere 
translation. Her knowledge of the mathematical theory of prob- 
abilities has enabled her to suggest to me several improvements in 
my Danish notes. 

I am also under great obligations to a number of friends and 
colleagues who have assisted me in the preparation of this volume. 
I am especially indebted to Mr. E. C. Molina, the well-known 
probability expert of the American Telephone and Telegraph 
Company. Mr. Molina's extensive knowledge of the works 
of the old French masters, especially of those of Laplace, 
has been of the greatest value to me, and I can truthfully 
say that I have nowhere met a mathematician so thoroughly 
acquainted with the intricacies of the Theorie Analytique as 
Mr. Molina. 

My thanks are also due to Mr. F. L. Hoffman, the Statistician 
of the Prudential Insurance Company, for the interest he took in 
my work along those lines while I was employed as a computer in 
his department. To Messrs. M. C. Rorty and D. R. Belcher of the 
American Telephone and Telegraph Company, I beg leave to 

' As a pure probability text we may mention G. Castelnuovo's, Calcolo delle 
Probabilita (MUano, 1919), as an exceptionally lucid and rigorous treatise. 
The recently issued Treatise on Probability by J. M. Keynes is briefly discussed 
in paragraph 138 of this book. A. F. 
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express my best thanks for their kind advice and encouragement 
in the preparation of this volume. 

It is indeed impossible to adequately express in a mere formal 
preface my obligations to Mr. Rorty in this matter. His introduc- 
tory note I regard as one of the highest rewards I have received in 
this field of endeavor where one must usually be content with the 
appreciation of one's peers. In this connection it is of interest to 
note that Mr. Rorty is the pioneer investigator in the application 
of the mathematical theory of probabilities to telephone engineer- 
ing, which has been further developed in recent years by Molina of 
America, Erlang and Johaimsen of Denmark, Holm of Sweden, 
Odell and Grinsted of Great Britain. The pioneer work by Mr. 
Rorty in this eminently practical field antedates the earliest work 
by Erlang in Tidsskrift for Matematik by nearly five years. 

Last, but not least, I wish to convey my sincerest thanks to my 
Scandinavian compatriots, Westergaard, Charlier, Jorgensen, 
Wicksell and Guldberg from whose works I have drawn so freely. 
To these gentlemen and to the works of the late Messrs. Gram and 
Thiele of Copenhagen I really owe anything of value which may 
be contained in this work. 

Ahne Fisher. 
New York, 

April, 1922. 



INTRODUCTORY NOTE TO THE FIRST EDITION. 

I feel it a great honor to have been asked by my friend and 
colleague, ^Mr. Arne Fisher, of the Equitable Life Assurance 
Society of the United States, to write an introductory note to 
what appears to me the finest book as yet compiled in the English 
language on the subject of which it treats. As an Examiner 
myself in Statistical Method for a British Colonial Government, 
it has been to me a heart-breaking experience, when implored by 
intending candidates for examination to recommend a text-book 
dealing with IMr. Fisher's subject matter, that it has heretofore 
been impossible for me to recommend one in the English language 
which covers the whole of the ground. Until comparatively 
recent years the case was even worse. While in French, in Italian, 
in German, in Danish, and in Dutch, scientific works on statistics 
were available galore, the dearth of such literature in the English 
language was little short of a national or racial scandal. With 
such works as those of Yule and Bowley, in recent years, there 
has been some possibility for the English-speaking student to 
acquire part of the knowledge needed. But it is hardly necessary 
to point out what a very large amount of new ground is covered 
by Mr. Fisher's new book as compared with such works as I have 
referred to. 

Despite my professional connection with statistical and actu- 
arial work of a technical character my own personal interest in 
INIr. Fisher's book is concentrated principally on the metaphysical 
basis of the Probability-theory, and it is with regard to this 
aspect of the subject alone that I feel quahfied to comment on his 
achievement. With all the controversy that has gone on through 
many decades among metaphysicians and among writers on logic 
interested especially in the bases of the theories of probability and 
induction, between the pure empiricists of the type of J. S. Mill 
and John Venn (at all events in the earliest edition of his work) 
on the one hand, and the (partljO a priori theorists who base their 
doctrine on the foundation of Laplace on the other hand, it has 
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been a source of intense satisfaction to me, as in the main a dis- 
ciple of the latter group of theorists, to note the masterly way in 
which Mr. Arne Fisher disentangles the issues which arise in the 
keen and sometimes almost embittered controversy between these 
two schools of thought. It has always seemed to the present 
writer as if the very foundations of Epistemology were involved 
in this controversy. The impossibility of deriving the corpus of 
human knowledge exclusively from empirical data by any logic- 
ally valid process — an impossibility which led Immanuel Kant 
to the creation of his epoch-making philosophical system — is 
hardly anywhere made more evident than in what seems to the 
present writer the unsuccessful effort of thinkers like John Venn 
to derive from such purely empirical data the entire Theory of 
Probability. The logical fallacy of the process is analogous to 
that perpetrated by John Stuart Mill in endeavoring to base the 
Law of Causality on what he termed an "indudio per simplicem 
enumerationem." Probably there is nowhere a more trenchant 
and conclusive exposure of the unsoundness of this point of view, 
than in the Right Honorable Arthur James Balfour's monu- 
mental work "A Defense of Philosophic Doubt." It is there- 
fore satisfactory to find that Mr. Fisher emphasizes, quite at the 
beginning of his treatise, that an a priori foundation for " Proba- 
bility" judgments is indispensable. 

Hardly less gratifying, from the metaphysical point of view, 
is Mr. Fisher's treatment of the celebrated quaestio vexata of 
Inverse Probabilities and his qualified vindication of Bayes' 
Rule against its modern detractors. 

Aside altogether from metaphysics, it is particularly satis- 
factory to note the full and clear way in which the author treats 
the Lexian Theory of Dispersion and of the "Stability" of sta- 
tistical series and the extension of this theory by recent Scandi- 
navian and Russian investigators, — a branch of the science which 
has till the appearance of this new work not been adequately 
covered in English text-books. 

It may of course be a moot question whether the preference 
given by our author to Charlier's method of treating " Frequency 
Curves" over the method of Professor Karl Pearson is well 
advised. But whatever the experts' verdict may be on debatable 
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questions like these, the scientific world is to be congratulated on 
Mr. Fisher's presentment of a new and sound point of view, and 
he emphatically is to be congratulated on the production of a 
text-book which for many years to come will be invaluable both 
to students and to his confreres who are engaged in extending 
the boundaries of this fascinating science. 

F. W. Frankland, 
Member of the Actuarial Society of America, 
Fellow of the Institute of Actuaries of Great 
Britain and Ireland, and Fellow of the 
Royal Statistical Society of London. 

New York, 
October 1, 1915. 
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" Probability " has long ago ceased to be a mere theory of games 
of chance and is everywhere, especially on the continent, regarded 
as one of the most important branches of applied mathematics. 
This is proven by the increasing number of standard text-books in 
French, German, Italian, Scandinavian and Russian which have 
appeared during the last ten years. During this time the research 
work in the theory of probabilities has received a new impetus 
through the labors of the English biometricians under the leader- 
ship of Pearson, the Scandinaxdan statisticians Westergaard, 
Charlier and Kieer, the German statistical school under Lexis, and 
the brilliant investigations of the Russian school of statisticians. 

Each group of these investigations seems, however, to have 
moved along its own particular lines. The English schools have 
mostly limited their investigations to the field of biology as pub- 
lished in the extensive memoirs in the highly specialized journal, 
Biometrika. The Scandinavian scholars have produced researches 
of a more general character, but most of these researches are un- 
fortunately contained in Scandinavian scientific journals and are 
for this reason out of reach to the great majority of readers who 
are not famihar with any of the allied Scandinavian languages. 
This applies in a still greater degree to the Russians. German 
scholars of the Lex-is school have also contributed important 
memoirs, but strangely enough their researches are little known 
in this country or in England, a fact which is emphasized through 
the belated Enghsh discussion on the theory of dispersion as devel- 
oped by Lexis and his disciples. The same can also be said with 
regard to the Italian statisticians. 

In the present work I have attempted to treat all these modern 
researches from a common point of view, based upon the mathe- 
matical principles as contained in the immortal work of the great 
Laplace, "Theorie analytique des Probabilites," a work which 
despite its age remains the most important contribution to the 

six 
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theory of probabilities to our present day. Charlier has rightly 
observed that the modern statistical methods may be based upon 
a few condensed rules contained in the great work of Laplace. 
This holds true despite the fact that many modern English 
writers of late have shown a certain distrust, not to say actual 
hostility, towards the so-called mathematical probabilities as 
defined by the French savant, and have in their place adopted the 
purely empirical probability ratios as defined by Mill, Venn and 
Chrystal. It is quite true that it is possible to build a consistent 
theory of such ratios, as for an instance is done by the Danish 
astronomer and actuary, Thiele. The theorj-, however, then 
becomes purely a theory of observations in which the theory of 
probability takes a secondary place. The distrust in the so-called 
mathematical a priori probabilities of Laplace I believe, however, 
to be unfounded, and the criticism to which that particular kind 
of probabilities is subjected by a few of the modern English 
writers is, I believe, due to a misapprehension of the true nature 
of the Bernoullian Theorem. This renowned theorem remains 
to-day the cornerstone of the theory of statistics, and upon it I 
have based the most important chapters of the present work. 
Following the beautiful investigations of TschebychefF and 
Pizetti in their proofs of Bernoulli's Theorem and the closely 
related theorem of large numbers by Poisson I have adopted the 
methods of the Swedish astronomer and statistician, Charlier, 
in the discussion of the Lexian dispersion theory. 

The theory of frequency curves is treated from various points 
of view. I have first given a short historical introduction to the 
various investigations of the law of errors. The Gaussian 
normal curve of error was by the older school of statisticians 
held to be sufficient to represent all statistical frequencies, and 
actual observed deviations from the normal curve were attributed 
to the limited number of observations. Through the original 
memoirs of Lexis and the investigations of Thiele the fallacy of 
such a dogmatic belief was finally shown. The researches of 
Thiele, and later of Pearson, developed later the theory of skew 
curves of error. As recently as 1905 Charlier finally showed 
that the whole theory of errors or frequency curves may be 
brought back to the principles of Laplace. I have treated this 
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subject by the methods of both Pearson and Charlier, although I 
•have given the methods of the latter a predominant place, because 
of their easy and simple application in the practical computations 
required by statistical work. The mathematical theory of cor- 
relation, which is treated in an elementary manner only, is based 
upon the same principles. 

The statistical examples serve as illustrations of the theory, and 
it will be noted that it is possible to solve all the important sta- 
tistical problenis presenting themselves in daily work on the basis 
of a theory of mathematical probabilities instead of on a direct 
theory of statistical methods. I have here again followed Charlier 
in dividing all statistical problems into two distinct groups, 
namely, the homograde and the heterograde groups. 

In treating the philosophical side of the subject I have naturally 
not gone into much detail. However, I have tried to emphasize 
the two diametricalh' opposite standpoints, namely the principle 
of what von Kries has called the principle of "cogent reason," 
and the principle which Boole has aptly termed "the equal 
distribution of ignorance." These two principles are clearly illus- 
trated in the case of the so-called inverse probabilities. As far as 
pure theory is concerned, the theory of "inverse probabilities" 
is rigorous enough. It is only when making practical applications 
of the rule of inverse probabilities (the so-called Bayes' Rule) 
that many writers have made a fatal mistake by tacitly assuming 
the principle of "insufficient reason" as the only true rule of com- 
putation. This leads to paradoxical results as illustrated by the 
practical problem from the region of actuarial science in Chapter 
VI In this book. 

In a work of this character I have naturally made an extended 
use of the higher mathematical analysis. However, the reader 
who is not versed in these higher methods need not feel alarmed 
on this account, as the elementary chapters are arranged in such a 
way that the more difficult paragraphs may be left out. I have 
in fact divided the treatise into two separate parts. The first 
part embraces the mathematical probabilities proper and their 
applications to homograde statistical series. This part, I think, 
constitutes what is -usually given as a course in vital statistics in 
many American colleges. I hardly deem it worth while to give a 
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detailed discussion on the collection and arrangement of the sta- 
tistical data as to various frequency distributions. The mere 
graphical and serial representation of frequency functions by 
means of histographs and frequency columns is so simple and 
evident that a detailed description seems superfluous. The fitting 
of the various curves to analytical formulas and the determination 
of the various parameters seem to me of much greater impor- 
tance. The theory of curve fitting which is treated in the second 
volume is founded upon a more advanced mathematical analysis 
and is for this reason out of reach to the average American student 
who desires to learn only the rudiments of modern statistical 
methods. Practical statisticians, on the other hand, will derive 
much benefit from these higher methods. It is a fact generally 
noted in mathematics that the practical application of a difficult 
theory is much simpler than that of a more elementary theory. 
This is amply proven by the appearance of an excellent little 
Scandinavian brochure by Charlier : " Grunddragen af den mate- 
matiske Statistikken." ("Rudiments of Mathematical Statis- 
tics.") I have always attempted to adapt theory to actual 
practical problems and requirements rather than to give a purely 
mathematical abstract discussion. In fact it has been my aim 
to present a theory of probabilities as developed in recent years 
which would prove of value to the practical statistician, the 
actuary, the biologist, the engineer and the medical man, as 
well as to the student who studies mathematics for the sake of 
mathematics alone. 

The nucleus of this work consisted of a number of notes written 
in Danish on various aspects of the theory of probabilities, col- 
lected from a great number of mathematical, philosophical and 
economic writings in various languages. At the suggestion of 
my former esteemed chief, Mr. H. W. Robertson, F.A.S., As- 
sistant Actuary of the Equitable Life Assurance Society of the 
United States, I was encouraged to collect these fragmentary 
notes in systematic form. The rendering in English was done 
by myself personally with the assistance of I\Ir. W. Bonynge. 
With his assistance most of the idiomatic errors due to my 
barbaric Dano-English have been eliminated. The notes stand, 
however, in the main as a faithful reproduction of my original 
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English copy. Although the resulting " Dano-Enghsh " may 
have its great shortcomings as to rhetoric and grammar, I hope 
to have succeeded in expressing what I wanted to say in such 
a manner that my possible readers may follow me without 
difficulty. 

I gladly take the opportunity of expressing my thanks to a 
number of friends and colleagues who in various ways have as- 
sisted me in the preparation of this work. ]\Iy most grateful 
thanks are due to Mr. F. W. Frankland, Mr. H. W. Robertson 
and Mr. Wm. Bonynge not only for reading the manuscript and 
most of the proofs, but also for the friendly help and encourage- 
ment in the completion of this volume. The introductory note 
by Mr. Frankland, coming from the pen of a scholar who for the 
most of a life-time has worked with statistical-mathematical 
subjects and who has taken a special interest in the philosophical 
and metaphysical aspects of the probabiHty theory, I regard as 
one of the strong points of the book. ]My debts to INIessrs. 
Frankland and Robertson as well as to Dr. W. Strong, Associate 
Actuary of the INIutual Life Insurance Company, are indeed of 
such a nature that they cannot be expressed in a formal preface. 
INIy thanks are also due to ^Nlr. A. Pettigrew in correcting the 
first rough draught of the first three chapters at a time when my 
knowledge of English was most rudimentary, to ]Mr. j\l. Dawson, 
Consulting Actuary, and ^Ir. R. Henderson, Actuary of the Equit- 
able Life, for reading a few chapters in manuscript and making 
certain critical suggestions, to Professors C. Grove and W. Fite, of 
Columbia University, for numerous technical hints in the working 
out of various mathematical formulas in Chapter VL to Miss 
G. ]Morse, librarian of the Equitable Library, in the search of 
certain bibliographical material. Last but not least I wish to 
express my sincerest thanks to several of my Scandinavian com- 
patriots for allowing me to quote and use their researches on 
various statistical subjects. I want in this connection especially 
to mention Professor Charher, of Lund, and Professors Wester- 
gaard and Johannsen, of Copenhagen. 

To The Alacmillan Company and The New Era Printing Com- 
pany I beg leave to convey my sincere appreciation of their very 
courteous and accommodating attitude in the manufacture of 
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this work. Their spirit has been far from commercial in this — 
from a pure business standpoint — somewhat doubtful under- 
taking. 

Aene Fisher. 
New Yohk, 
October, 1915. 



TABLE OF CONTENTS. 

FART I. 

MATHEMATICAL PROBABILITIES AND HOMOGRADE 

STATISTICS. 

Chapter I. 

Introduction: General Principles and Philosophical Aspects. 

Page 

1 . Methods of Attack 1 

2. Law of Causality 1 

3. Hypothetical Judgments 3 

4. Hypothetical Disjunctive Judgments 4 

5. General Definition of the Probability of an Event 5 

6. Equally likely Cases 6 

7. Objective and Subjective Probabilities 9 

Chapter II. 

Historical and Bibliographical Notes. 

8. Pioneer Writers 11 

9. Bernoulli, de Moivre and Bayes 12 

10. Application to Statistical Data 13 

11. Laplace and Modem Writers 14 

Chapter III. 

The Mathematical Theory of Probabilities. 

12. Definition of Mathematical Probability 17 

13. Example 1 18 

14. Example 2 20 

15. Example 3 20 

16. Example 5 22 

17. Example 6 23 

Chapter IV. 

The Addition and Multiplication Theorems in Probabilities. 

18. Systematic Treatment by Laplace 26 

19. Definition of Technical Terms 26 

20. The Theorem of the Complete or Total Probability, or the Proba- 

bility of "Either Or" 27 

21. Theorem of the Compound Probability or the Probability of "As 

Well As" 28 

22. Poincare's Proof of the Addition and Multiphcation Theorem 30 

23. Relative Probabilities 31 

24. Multiphcation Theorem 33 

25. Probability of Repetitions 33 

I* XXV 



XXVI TABLE OF CONTENTS. 

26. Application of the Addition and Multiplication Theorems in Problems 

in Probabilities 35 

27. Example 12 35 

28. Example 13 36 

29. Example 14 37 

30. Example 15 37 

31. Example 16 38 

32. Example 17 39 

33. Example 18. De JNIoivre's Problem 40 

34. Example 19 42 

35. Example 20. Tchebycheff 's Problem 46 

Chapter V. 

Mathematical Expectation. 

36. Definition, Mean Values 49 

37. The Petrograd (St. Petersburg) Problem 51 

38. Various Explanations of the Paradox. The Moral Expectation. ... 51 

Chapter \1. 

Probability a Posteriori. 

39. Bayes's Rule. A Posteriori Probabilities 54 

40. Discovery and History of the Rule 55 

41. Bayes's Rule (Case I) 56 

42. Bayes's Rule (Case II) 59 

43. Determination of the Probabilities of Future Events Based upon 

Actual Observations 59 

44. Examples on the Application of Bayes's Rule 61 

45. Criticism of Bayes's Rule 62 

46. Theory versus Practice 64 

47. Probabilities expressed by Integrals 67 

48. Example 24 70 

49. Example 25. Ring's Paradox 72 

50. Conclusion 76 

Chapter VII. 

The Law of Large Numbers. 

51. A Priori and Empirical Probabilities 82 

52. Extent and Usage of Both Methods 85 

53. Average a Priori Probabilities 87 

54. The Theory of Dispersion 88 

55. Historical Development of the Law of Large Numbers 89 

Chapter VIII. 

Introductory Formulas from the Infinitesimal Calculus. 

56. Special Integrals 90 

57. Wallis's Expression of ir as an Infinite Product 90 

58. De Moivre— Stirling's Formula 92 



TABLE OF CONTENTS. XXvil 

Chapteb IX. 

Law of Large Numbers. Mathematical Deduction. 

59. Repeated Trials 96 

60. Most Probable Value 97 

61. Simple Numerical Examples 97 

62. The Most Probable Value in a Series of Repeated Trials 99 

63. Approximate Calculation of the Maximum Term, r„ 101 

64. Expected or Probable Value 102 

65. Summation Method of Laplace. The ilean Error 104 

66. ^Nlean Error of ^'arious Algebraic Expressions 106 

67. Tchebycheff's Theorem 108 

68. The Theorems of Poisson and Bernoulli proved by the Application 

of the Tchebycheffian Criterion 110 

69. BernouUian Scheme 110 

70. Poisson's Scheme Ill 

71. Relation between Empirical Frequency Ratios and Mathematical 

Probabihties 114 

72. Application of the Tchebycheffian Criterion 115 

Chapter X. 

The Theory of Dispersion and the Criterions of Lexis and Charlier. 

73. BernouUian, Poisson and Lexis Series 117 

74. The Mean and Dispersion 118 

74a. Mean or Average Deviation 122 

75. The Lexian Ratio and Charlier Coefficient of Disturbancy 124 

Chapter XI. 

Application to Games of Chance and Statistical Problems. 

76. Correlate between Theory and Practice 127 

77. Homograde and Heterograde Series. Technical Terms 128 

78. Computation of the Mean and the Dispersion in Practice 130 

79. Westergaard's Experiments 136 

80. CharUer's Experiments 137 

81. Experiments by Bonynge and Fisher 141 

CHAPTER XII. 

Continuation of the Application of the Theory of Probabilities to 
Homograde Statistical Series. 

82. General Remarks 146 

83. Analogy between Statistical Data and Mathematical Probabilities. . 147 

84. Number of Comparison and Proportional Factors 149 

85. Child Births in Sweden 151 

86. Child Births in Denmark 152 



XX, 111 TABLE OF CONTENTS. 

87. Danish Marriage Series 153 

88. Stillbirths 154 

89. Coal Mine Fatalities 155 

90. Reduced and Weighted Series in Statistics 157 

91. Secular and Periodical Fluctuations 161 

92. Cancer Statistics 165 

93. Application of the Lexian Dispersion Theory in Actuarial Science. 

Conclusion 167 



PART II. 

FREQUENCY CURVES AND HETEROGRADE 

STATISTICS 

Chapter XIII. 

The Theory of Errors and Frequency Curves and Its Application 
to Statistical Series. General Remarks. 

94. General Remarks. The Hypotheses of Elementary Errors 169 

95. Application to Statistical Series. Definitions 173 

96. Compound Frequency Curves 176 

97. Early Writers 178 

98. Laplace and Gauss 179 

99. Quetelet's Studies 181 

100. Opperman, Gram, and Thiele 182 

101. Modern Investigations 184 



Chapter XIV. 
The Mathematical Theory of Frequency Curves. 

102. Frequency Distributions 188 

103. Parameters Considered as Symmetric Functions 189 

104. Semi-Invariants of Thiele 191 

105. The Fourier Integral Equation 194 

106. Frequency Function as the Solution of an Integral Equation 195 

107. The Normal or Laplacean Probability Function 197 

108. Hermite's Polynomials 199 

109. Orthogonal Functions 200 

110. The Frequency Function Expressed as a Series 202 

111. Derivation of Gram's Series 203 

112. Absolute Frequencies 206 

113. Coefficients Expressed by Semi-Invariants 208 

114. Change of Origin and Unit 210 



TABLE OF CONTENTS. Xxix 

PART III. 
PRACTICAL APPLICATIONS OF THE THEORY. 

Chapter XV. 
The Numerical Determination of the Parameters. 

1 15. General Remarks 215 

116. Remarks on Criticisms 216 

117. Charlier's Computation Scheme 218 

118. Comparison between Observed Data and Theoretical Values 220 

119. Principle of Method of Least Squares 221 

120. Gauss' Solution of Normal Equations 224 

121. Arithmetical Application of Method 225 

Chapter XVI. 
Logarithmically Transformed Frequency Functions. 

122. Transformation of the Variate 235 

123. The General Theory of Transformation 236 

124. Logarithmic Transformation 237 

125. The Mathematical Zero 238 

126. Logarithmically Transformed Frequency Series 239 

127. Parameters Determined by Least Squares 243 

128. Application to Graduation of Mortality Tables 244 

129. Formation of Observation Equations 246 

130. Additional Examples 257 

Chapter X^II. 
Frequency Curves and their Relation to the Bernoullian Series. 

131. The Bernoullian Series 261 

132. Poisson's Exponential 265 

133. The Law of SmaU Numbers 270 

Chapter XVIII. 
Poisson-Charlier Frequency Curves for Integral Variates. 

134. Charlier's B Curve 271 

135. Numerical Examples 273 

136. Transformation of the Variate 274 

137. Bernoullian Series expressed as B Curves 276 

138. Remarks on Mr. Keynes' Criticisms 278 



PART I 

MATHEMATICAL PROBABILITIES AND 
HOMOGRADE STATISTICS 



CHAPTER I. 

INTRODUCTION; GENERAL PRINCIPLES AND PHILOSOPHICAL 

ASPECTS. 

1. Methods of Attack. — The subject of the theory of proba- 
bilities may be attacked in two different ways, namely in a 
philosophical, and in a mathematical manner. At first the 
subject originated as isolated mathematical problems from games 
of chance. The pioneer writers on probability such as Cardano, 
Galileo, Pascal, Fermat, and Huyghens treated it in this way. 
The famous Bernoulli was, perhaps, the first to view the subject 
from the philosopher's point of view. Laplace wrote his well- 
known "Essai Philosophique des Probabilites," wherein he terms 
the whole science of probability as the application of common 
sense. During the last thirty years numerous eminent philo- 
sophical scholars such as Mill, Venn, and Keynes of England, 
Bertrand and Poincare of France, Sigwart, von Kries and Lange 
of Germany, Kroman of Denmark, and several Russian scholars 
have written on the philosophical aspect. 

In the ordinary presentation of the elements of the theory of 
probability as found in most English text-books, the treatment 
is wholly mathematical. The student is given the definition of 
a mathematical probability and the elementary theorems are 
then proved. We shall, in the following chapter, depart from 
this rule and first view the subject, briefly, from a philosophical 
standpoint. What the student may thus lose in time we hope 
he may gain in obtaining a broader view of the fundamental 
principles underlying our science. At the same time, the reader 
who is unacquainted with the science of philosophy or pure logic, 
need not feel alarmed, since not even the most elementary 
knowledge of the principles of formal logic is required for the 
understanding of the following chapter. 

2. Law of Causality. — In a great treatise on the Chinese civihza- 
tion, Oscar Peschel, the German geographer and philosopher, 
makes the following remarks: "Since our intellectual awakening, 
since we have appeared on the arena of history as the creators 
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2 INTRODUCTION. [ 2 

and guardians of the treasures of culture, we have sought after 
only one thing, of the presence of which the Chinese had no 
idea, and for which they would give hardly a bowl of rice. This 
invisible thing we call causality. We have admired a vast 
number of Chinese inventions, but even if we seek through their 
huge treasures of philosophical writing we are not indebted to 
them for a single theory or a single glance into the relation 
between cause and effect." 

The law of causality may be stated broadly as follows : Every- 
thing that happens, and everything that exists, necessarily 
happens or exists as the consequence of a previous state of things. 
This law cannot be proven. It must be taken, a priori, as an 
axiom; but once accepted as a truth it does away with the belief 
of a capricious ruling power, and even if the strongest disbeliever 
of the law may deny its truth in theory he invariably applies it 
in practice during his daily occupation in life. 

All future human activity is more or less influenced by past 
and present conditions. Modern historical writings, as for 
instance the works of the brilliant Italian historian, Ferrero, 
always seek to connect past events with present social and 
economic conditions. Likewise great and constructive statesmen 
in trying to shape the destinies of nations always reckon with 
past and present events and conditions. We often hear the term, 
"a man with foresight," applied to leading financiers and states- 
men. This does not mean that such men are gifted with a vision 
of the future, but simply that they, with a detailed and thorough 
knowledge of past and present events, associated with the par- 
ticular undertaking in which they are interested, have drawn 
conclusions in regard to a future state of affairs. For example, 
when the Canadian Pacific officials, in the early eighties, chose 
Vancouver as the western terminal for the transcontinental 
railroad, at a time when practically the whole site of the present 
metropolis of western Canada was only a vast timber tract, they 
realized that the conditions then prevailing on this particular 
spot — the excellent shipping facilities, the favorable location in 
regard to the Oriental trade, and the natural wealth of the sur- 
rounding country — would bring forth a great city, and their 
predictions came true. 
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Predictions with regard to the future must be taken seriously 
only when they are based upon a thorough knowledge of past 
and present events and conditions. Prophecies, taken in a 
purely biblical sense of the term and viewed from the law of 
causality, are mere guesses which may come true and may not. 
A prophet can hardly be called more than a successful guesser. 
Whether there have been persons gifted with a purely prophetic 
vision is a question which must be left to the theologians to 
wrangle over. 

3. Hypothetical Judgments. — Any person with ordinary in- 
tellectual faculties may, however, predict certain future events 
with absolute certainty by a simple application of the principle 
of hypothetical judgment. The typical form of the hypothetical 
judgment is as follows : If a certain condition exists, or if a certain 
event takes place then another definite event will surely follow. 
Or if .i exists B will invariably follow. 

Mathematical theorems are examples of hA-pothetical judg- 
ments. Thus in the geometry of the plane we start with certain 
ideas (axioms) about the line and plane. From these axioms 
we then deduce the theorems by mere hypothetical judgments. 
Thus in the Euclidian geometry we find the axiom of parallel 
lines, which assumes that through a point only one line can be 
drawn parallel to another given line, and from this assumption 
we then deduce the theorem that the sum of the angles in a 
triangle is 180°. But it must be borne in mind that this proof is 
valid only on the assumption of the actual existence of such lines. 
If we could prove directly by logical reasoning or by actual 
measurement, that the sum of the angles in any triangle is equal 
to 180°, then we would be able to prove the above theorem, the 
so-called "hole in geometry," independently of the axiom of 
parallel lines. 

A Russian mathematician, Lobatschewsky, on the other hand, 
assumed that through a single point an infinite number of parallels 
might be drawn to a previously given line, and from this as- 
sumption he built up a complete and vahd geometry of his own. 
Still another mathematician, Riemann, assumed that no lines 
were parallel to each other, and from this produced a perfectly 
valid surface geometry of the sphere. 
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As examples of hypothetical judgment we have the two follow- 
ing well-known theorems from elementary geometry and algebra. 
If one of the angles of a triangle is divided into two parts, then 
the line of division intersects the opposite side. If a decadian 
number is divided by 5 there is no remainder from the division. 

In natural science, hypothetical judgments are founded on 
certain occurrences (phenomena) which, without exception, have 
taken place in the same manner, as shown by repeated obser- 
vations. The statement that a suspended body will fall when its 
support is removed is a hypothetical judgment derived from 
actual experience and observation. 

4. H3rpothetical Disjunctive Judgments. — In hypothetical 
judgments we are always able to associate cause and effect. It 
happens frequently, however, that our knowledge of a certain 
complex of present conditions and actions is such that we are 
not able to tell beforehand the resulting consequences or effects 
of such conditions and actions, but are able to state only 
that either an event Ei or an event E2, etc., or an event En will 
happen. This represents a hypothetical disjunctive judgment 
whose typical form is : If A exists either Ei, E2, -E3, • • • or -E„ 
will happen. 

If we take a die, i. e., a homogeneous cube whose faces are 
marked with the numbers from one to six, and make an ordinary 
throw, we are not able to tell beforehand which side will turn 
up. True, we have here again a previous state of things, but the 
conditions do not allow such a simple analysis as the cases we 
have hitherto considered under the purely hypothetical judgment. 
Here a multitude of causes influence the final result — the weight 
and centre of gravity of the die, the infinite number of possible 
movements of the hand which throws the die, the force of contact 
with which the die strikes the table, the friction, etc. All these 
causes are so complex that our minds are not afforded an op- 
portunity to grasp and distinguish the impulses that determine 
the fall of the die. In other words we are not able to say, a 
priori, which face will appear. We only know for certain that 
either 1, 2, 3, 4, 5, or 6 will appear. If a line is drawn through 
the vertex of a triangle, it either intersects the opposite side or 
it does not. If a number is divided by 5 the division either gives 
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only an integral number or leaves a remainder. If an opening 
is made in the wall of a vessel partly filled with water, then either 
the water escapes or remains in the vessel. All the above cases 
are examples of hj-pothetical disjunctive judgments. 

The four cases show, however, a common characteristic. They 
all have a certain partial domain, where one of the mutually 
exclusive events is certain to happen, while the other partial 
domain will bring forth the other event, and the total area of 
action embraces both events. Taking the triangle, we notice 
that the lines may pass through all the points inside of an angle 
of 360°, but only the lines faUing inside the internal vertical 
angle, ip, of the triangle will produce the event in question, 
namely the line intersecting the opposite side. There tstII be 
an outflow from the vessel only if the hole is made in that part 
of the wall which is touched by the fluid. 

All problems do not allow of such simple analysis, however, 
as ^N-ill be seen from the following example. Suppose we have 
an urn containing 1 white and 2 black balls and let a person 
draw one from the urn. The hj-pothetical disjunctive judgment 
immediately tells us that the ball will be either black or white, 
but the particular domain of each event cannot be limited to the 
fixed border lines of the former examples. Any one of the balls 
may occupy an infinite number of positions, and furthermore we 
may imagine an infinite number of movements of the hand which 
draws the ball, each movement being associated with a particular 
point of position of the baU in the urn. If we now assume each 
of the three balls to have occupied all possible positions in the 
urn, each point of position being associated •n'ith its proper 
movement of the hand, it is readily seen that a black ball will 
be encountered tmce as often as a white ball in a particular 
point of position in the urn, and for this reason any particular 
movement of the hand which leads to this point of position 
grasps a black ball twice as often as a white ball. 

5. General Definition of the Probability of an Event. — All the 
above examples have shown the following characteristics: 

(1) A total general region or area of action in which all actions 
may take place, this total area being associated with all possible 
events. 
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(2) A limited special domain in which the associated actions 
produce a special event only. 

If these areas and domains, as in the above cases, are of such a 
nature that they allow a purely quantitative determination, 
they may be treated by mathematical analysis. We define 
now, without entering further into its particular logical signifi- 
cance, the ratio of the second special and limited domain to the 
first total region or area as the probability of the happening of 
the event, E, associated with domain No. 2. 

We must, however, hasten to remark that it is only in a com- 
paratively few cases that we are able, a priori, to make such a 
segregation of domains of actions. This may be possible in 
purely abstract examples, as for instance in the example of the 
division of the decadian number by 5. But in all cases where 
organic life enters as a dominant factor we are unable to make such 
sharp distinctions. If we were asked to determine the proba- 
bility of an a:-year-old person being alive one year from now, we 
should be able to form the hypothetical disjunctive judgment: 
An a;-year-old person will be either alive or dead one year from 
now. But a further segregation into special domains as was 
the case with the balls in the urn is not possible. Many ex- 
tremely complex causes enter into such a determination; the 
health of the particular person, the surroundings, the daily life, 
the climate, the social conditions, etc. Our only recourse in 
such cases is to actual observation. By observing a large 
number of persons of the same age, x, we may, in a purely em- 
pirical way, determine the rate of death or survival. Such a deter- 
mination of an unknown probability is called an empirical proba- 
bility. An empirical probability is thus a probability, into the 
determination of which actual experience has entered as a domi- 
nant factor. 

6. Equally Likely Cases. — The main difficulty, in the appli- 
cation of the above definition of probability, lies in the deter- 
mination of the question whether all the events or cases taking 
place in the general area of action may be regarded as equally 
likely or not. Two diametrically opposite views have here been 
brought forward by writers on probabilities. One view is based 
upon the principle which in logic is known as the principle of 
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■ insuflBcient reason," while the other ^■iew is based upon the 
principle of " cogent reason." The classical writers on the theory 
of probability, such as Jacob Bernoulli and Laplace, base the 
theory on the principle of insuiBcient reason exclusively. Thus 
Bernoulli declares the six possible cases by the throw of a die to 
be equally likely, since "on account of the equal form of all the 
faces and on account of the homogeneous structiu-e and equally 
arranged weight of the die. there is no reason to asstune that any 
face should tiu-n up in preference to any other." In one place 
Laplace says that the possible cases are "cases of which we are 
equally ignorant," and in another place, "we have no reason to 
beUeve any particidar case should happen in preference to any 
other. " 

The opposite ^■^ew, based on the principle of cogent reason, 
has been strongly endorsed in an admirable little treatise by the 
German scholar, -Johannes von Kries.^ Von Kries requires, first 
of all, as the main essential in a logical theory of probability, 
that "the arrangement of the equally hkely cases must have a 
cogent reason and not be subject to arbitrary conditions." 

Li several illustrative examples, von Kries shows how the 
principle of insufficient reason may lead to different and paradox- 
ical results. The following example will illustrate the main 
points in von Kries's criticism. Suppose we be given the follow- 
ing problem : Determine the probabihty of the existence of hiunan 
beings on the planet ^lars. By applying the first mentioned 
principle oiu- reasoning woidd be as follows: We have no more 
reason to assume the actual existence of man on the planet than 
the complete absence. Hence the probability for the non- 
existence of a human being, is equal to ^. Next we ask for the 
probability of the presence or non-presence of another earthly 
mammal, say the elephant. The answer is the same, f . Now 
the probability for the absence of both man and elephant on the 
planet i? 2 X § = !•' The probability for the absence of a third 
mammal, the horse, is also §, or the probability for the absence 
of man, elephant, and horse is equal to (i)* = } Proceeding in 
the same manner for aU mammals we obtain a ver\" small proba- 

1 "Die Principien der Wahrscheinlichkeitsrechnuiig."; Berlin, lsS6. 
' 5t>e the chapter on multiplication of probabilities. 
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bility for the complete absence of all mammals on Mars, or a 
very large probability, almost equal to certainty, that the planet 
harbors at least one mammal known on our planet, an answer 
which certainly does not seem plausible. But we might as well 
have put the question from the start: what is the probability 
of the existence or absence of any one earthly mammal on Mars? 
The principle of insufficient reason when applied directly would 
here give the answer 5, while when applied in an indirect manner 
the same method gave an answer very near to certainty. 

An urn is kijown to contain white and black balls, but the 
number of the balls of the two different colors is unknown. What 
is the probability of drawing a white ball? The principle of 
insufficient reason gives us readily the answer: f, while the prin- 
ciple of cogent reason would give the same answer only if it were 
known a priori that there were equal numbers of balls of each 
color in the urn before the drawing took place. Since this 
knowledge is not present a priori, we are not able to give any 
answer, and the problem is considered outside the domain of 
probabilities. There is no doubt that the principle advocated 
by von Kries is the only logical one to apply, and a recent 
treatise on the theory of probability by Professor Bruhns of 
Leipzig"^ also gives the principle of cogent reason the most promi- 
nent place. On the other hand it must be admitted that if the 
principle was to be followed consistently in its very extreme it 
would of course exclude many problems now found in treatises 
on probability and limit the application of our theory consider- 
ably in scope. Still, however, we must agree with von Kries 
that it seems very foolhardy to assign cases of which we are 
absolutely in the dark, as being equally likely to occur. This 
very principle of insufficient reason is in very high degree re- 
sponsible for the somewhat absurd answers to questions on the 
so-called "inverse probabilities," a name which in itself is a great 
misnomer. We shall later in the chapter on "a posteriori" 
probabilities discuss this question in detail. At present we shall 
only warn the student not to judge cases of which he has no 
knowledge whatsoever to be equally likely to occur. The old rule 
"experience is the best teacher" holds here, as everywhere else. 

' " KoUektivmasslehre and Wahrscheinlichkeitsrechnung," Leipzig, 1903. 
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7. Objective and Subjective Probabilities. — In this connection 
it is interesting to note the lucid remarks by the Danish statis- 
tician, Westergaard. "By every well arranged game of chance, 
by lotteries, dice, etc.," Westergaard says, "everything is ar- 
ranged in such a way that the causes influencing each draw or 
throw remain constant as far as possible. The balls are of the 
same size, of the same wood, and have the same density; they are 
carefully mixed and each ball is thus apparently subject to the 
influences of the same causes. However, this is not so. Despite 
all our efforts the balls are different. It is impossible that they 
are of exactly mathematically spherical form. Each ball has its 
special deviation from the mathematical sphere, its special size 
and weight. Xo ball is absolutely similar to any one of the 
others. It is also impossible that they may be situated in the 
same manner in the bag. In short there is a multitude of ap- 
parently insignificant differences which determine that a certain 
definite ball and none of the other balls may be drawn from the 
bag. If such inequalities did not exist one of two things would 
happen. Either all balls would turn up simultaneouslj' or also 
they would all remain in the bag. IMany of these numerous 
causes are so small that they perhaps are invisible to the naked 
eye and completely escape all calculations, but by mutual 
action they may nevertheless produce a visible result." 

It thus appears that a rigorous application of the principle of 
cogent reason seems impossible. However, a compromise 
between this principle and that of the principle of insufficient 
reason may be effected by the following definition of equally 
possible cases, viz. : Equally possible cases are such cases in which 
we, after an exhaustive analysis of the physical laws underlying the 
structure of the complex of causes influencing the special event, are 
led to assume that no particular case tvill occui in preference to any 
other. True, this definition introduces a certain subjective 
element and may therefore be criticized by those readers who 
wish to make the whole theory of probabilities purely objective. 
Yet it seems to me preferable" to the strict application of the 
principle of equal distribution of ignorance. Take again the 
question of the probability of the existence of human beings on 
the planet Mars. The principle of equal distribution of ignorance 



10 INTRODUCTION. 17 

readily gives us without further ado the answer f . Modern astro- 
physical researches have, however, verified physical conditions on 
the planet which make the presence of organic life quite possible, 
and according to such an eminent authority as Mr. Lowell, perhaps 
absolutely certain. Yet these physical investigations are as 
yet not sufficiently complete, and not in such a form that they 
may be subjected to a purely quantitative analysis as far as the 
theory of probabilities is concerned. Viewed from the standpoint 
of the principle of cogent reason any attempt to determine the 
numerical value of the above probability must therefore be put 
aside as futile. This result, negative as it is, seems, however, 
preferable to the absolute guess of f as the probability. 



CHAPTER II. 

HISTORICAL AND BIBLIOGRAPHICAL NOTES. 

8. Pioneer "Writers. — The first attempt to define the measure 
of a probabiUty of a future event is credited to the Greek philos- 
opher, Aristotle. Aristotle calls an event probable when the 
majority, or at least the majority of the most intellectual persons, 
deem it likely to happen. This definition, although not allowing 
a purely quantitative measurement, makes use of a subjective 
judgment. 

The first really mathematical treatment of chance, however, is 
given by the two Italian mathematicians, Cardano and Galileo, 
who both solved several problems relating to the game of dice. 
Cardano, aside from his mathematical occupation, was also a 
professional gambler and had evidently noticed that in all kinds 
of gambling houses cheating was often resorted to. In order 
that the gamester might be fortified against such cheating prac- 
tices, Cardano wrote a little treatise on gambling wherein he 
discussed several mathematical questions connected with the 
different games of dice as played in the Italian gambling houses 
at that time. Galileo, although not a professional gambler, was 
often consulted by a certain Italian nobleman on several problems 
relating to the game of dice, and fortunately the great scholar 
has left some of his investigations in a short memoir. In the 
same manner the two great French mathematicians, Pascal and 
Fermat, were often asked by a professional gamester, the cheva- 
lier de ^Vlere, to apply their mathematical skill to the solution of 
different gambling problems. It was this kind of investigation 
which probably led Pascal to the discovery of the arithmetical 
triangle, and the first rudiments of the combinatorial analysis, 
which had its origin in probability problems, and which later 
evolved into an independent branch of mathematical analysis. 

One of the earliest works from the illustrious Dutch physicist, 
Huyghens, is a small pamphlet entitled "de Ratiociniis in Ludo 
Alese," printed in Leyden in the year 1657. Huyghens' tract is 

11 
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the first attempt of a systematic treatment of the subject. The 
famous Leibnitz also wrote on chance. His first reference to a 
mathematical probability is perhaps in a letter to the philoso- 
pher, WolfF, wherein he discusses the summation of the infinite 
series 1 — 1 + 1 — !+•••• Besides he solved several problems. 

9. Bernoulli, de Moivre and Bayes. — The first extensive 
treatise on the theory as a whole is from the hand of the famous 
Jacob BernouUi. Bernoulli's book, "Ars Conjectandi," marks a 
revolution in the whole theory of chance. The author treats 
the subject from the mathematical as well as from a philo- 
sophical point of view, and shows the manifold applications of 
the new science to practical problems. Among other important 
theorems we here find the famous proposition which has become 
known as the Bernoulli Theorem in the mathematical theory of 
probabilities. Bernoulli's work has recently been translated 
from the Latin into German,^ and a student who is interested in 
the whole theory of probability should not fail to read this 
masterly work. 

The English mathematicians were the next to carry on the 
investigations. Abraham de Moivre, a French Huguenot, and 
one of the most remarkable mathematicians of his time, wrote 
the first English treatise on probabilities.^ This book was cer- 
tainly a worthy product of the masterful mind of its author, and 
may, even today, be read with useful results, although the 
method of demonstration often appears lengthy to the student 
who is accustomed to the powerful tools of modern analysis. 
The high esteem in which the work by de Moivre is held by 
modern writers, is proven by the fact that E. Czuber, the eminent 
Austrian mathematician and actuary, so recently as two years 
ago translated the book into German. A certain problem (see 
Chap. IV) still goes under the name of "The Problem of de 
Moivre" in the modern literature on probability. A contem- 
porary of de Moivre, Stirling, contributed also to the new branch 
of mathematics, and his name also is immortalized in the theory 
of probability by the formula which bears his name, and by which 
we are able to express large factorials to a very accurate degree 
of approximation. The third important English contributor is 

' Ars Conjectandi, Ostwald's Klassiker No. 108, Leipzig, 1901. 
''■ de Moivre: "The Doctrine of Chances," London, 1781. 
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the Oxford clergyman, T. Bayes. Bayes' treatise, which was 
published after his death by Price, in Philosophical Transactions 
for 1764, deals with the determination of the a posteriori proba- 
bilities, and marks a very important stepping stone in our whole 
theory. Unfortunately the rule known as Bayes' Rule has been 
applied very carelessly, and that mostly by some of Bayes' own 
countrymen; so the whole theory of Bayes has been repudi- 
ated by certain modern writers. A recent contribution by the 
Danish philosophical writer. Dr. Kroman, seems, however, to have 
cleared up all doubts on the subject, and to have given Bayes his 
proper credit. 

10. Application to Statistical Data. — In the eighteenth century 
some of the most celebrated mathematicians investigated 
problems in the theory of probability. The birth of life as- 
surance gave the whole theory an important application to 
social problems and the increasing desire for the collection of all 
kinds of statistical data by governmental bodies all over Europe 
gave the mathematicians some highly interesting material to 
which to apply their theories. No wonder, therefore, that we 
in this period find the names of some of the most illustrious mathe- 
maticians of that time, such as Daniel Bernoulli, Euler, Nicolas 
and John Bernoulli, Simpson, D'Alembert and Buffon, closely 
connected with the solution of problems in the theory of mathe- 
matical probabilities. We shall not attempt to give an account 
of the diherent works of these scientists, but shall only dwell 
briefly on the labors of Bernoulli and D'Alembert. In a memoir 
in the St. Petersburg Academy, Daniel Bernoulli is the first to 
discuss the so called St. Petersburg Problem, one of the most 
hotly debated in the whole realm of our science. We may here 
mention that this problem is today one of the main pillars in the 
economic treatment of value Bernoulli introduced in the dis- 
cussion of the above mentioned problem the idea of the "moral 
expectation," which under slightly different names appears in 
nearly all standard writings on economics. 

D'Alembert is especially remembered for the critical attitude 
he took towards the whole theory. Although one of the most 
brilliant thinkers of his age, the versatile Frenchman made some 
great blunders in his attempt to criticize the theories of chance. 
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Buffon's name is remembered because of the needle problem, 
and he may properly be called the father of the so-called "ge- 
ometrical" or "local" probabilities. 

11. Laplace and Modem Writers. — We now come to that 
resplendent genius in the investigation of the mathematical 
theory of chance, the immortal Laplace, who in his great work, 
"Theorie Analytique des Probabilites, " gave the final mathe- 
matical treatment of the subject. This massive volume leaves 
nothing to be desired and is still today — more than one hundred 
years after its first publication — a most valuable mine of in- 
formation and compares favorably with much more modern 
treatises. But like all mines, it requires to be mined and is by 
no means easy reading for a beginner. An elementary extract, 
"Essai Philosophique des Probabilites," containing the more 
elementary parts of Laplace's greater work and stripped of all 
mathematical formulas has recently appeared in an English 
translation. 

Among later French works, Cournot's "Exposition de la 
Theorie des Chances et des Probabilites" (1843), treated the 
principal questions in the application of the theory to practical 
problems in sociology. In 1837 Poisson published his "Re- 
cherches sur les Probabilites " in which he for the first time proved 
the famous theorem which bears his name. Poisson and his 
Belgian contemporary, Quetelet, made extensive use of the 
theory in the treatment of statistical data. 

Among the most recent French works, we mention especially 
Bertrand's "Calcul des Probabilites" (Paris, 1888), Poincar^'s 
"Calcul des Probabilites" (Paris, 1896), and Borel's "Calcul des 
Probabilites" (Paris, 1901). We especially recommend Poin- 
care's brilliant little treatise to every student who masters the 
French language, as this book makes no departure from the 
lively and elucidating manner in which this able mathematical 
writer treated the numerous subjects on which he wrote during 
his long and brilliant career as a mathematician. 

Of Russian writers, the mathematician, Tchebycheff, has given 
some extensive general theorems relating to the law of large 
numbers. Unfortunately Tchebycheff 's writings are for the 
most part scattered in French, German, Scandinavian and 
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Russian journals, and thus are not easily accessible to the ordinary- 
reader. A Russian artillery ofScer, Sabudski, has recently pub- 
lished a treatise on ballistics in German, wherein he extends the 
views formulated by Tchebycheff. 

Of Scandinavian writers we mention T. N. Thiele, who prob- 
ably was the first to publish a systematic treatise on skew curves.-^ 
An abridged edition of this very original work has recently been 
translated into English.^ The Dane, Westergaard, is the author 
of the most extensive and thorough treatise on vital statistics 
which we possess at the present time. Westergaard's work has 
recently been translated into German,' and is strongly recom- 
mended to the student of vital statistics on account of his clear 
and attractive style of presenting this important subject. 

The Swedish mathematicians Charlier and Gylden have 
published a series of memoirs in different Scandinavian journals 
and scientific transactions. We may also, in this category, 
mention the numerous small articles by the eminent Danish 
actuary. Dr. Gram. 

While the German mathematicians in general are the most 
fertile writers on almost every branch of pure and applied mathe- 
matics, they have not shown much activity in the theory of 
mathematical probability except in the past ten years. But 
during that time there has appeared at least a dozen standard 
works in German. Among these, the lucid and terse treatise 
by E. Czuber, the Austrian actuary and mathematician, is 
especially attractive to the beginner on account of the systematic 
treatment of the whole subject.^ A very original treatment is 
offered by H. Bruhns in his " Kollektivmasslehre und Wahrschein- 
lichkeitsrechnung" (Leipzig, 1903). Among the German works, 
we may also mention the book by Dr. Norman Herz in " Samm- 
lung Schubert," and an excellent little work by Hack in the small 
pocket edition of "Sammlung Goschen." The theory of skew 
curves and correlation is presented by Lipps and Bruhns in 
extensive treatises. 

1 " Almindelig lagttagelseslaere," Copenhagen, 1884. 

2 "Theory of Observations," London, 1903. 
' " MortaUtat und MorbiUtat,'' Jena, 1902. 

*E. Czuber, "Wahrscheinhchkeitsreohnung," Leipzig, 1908 anfi 1910, 2 
volumes. 
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We finally come to modern English writers on the subject. 
After the appearance of de ]Moivre's "Doctrine of Chances" 
the first work of importance was the book by de Morgan "An 
Essay on the Theory of Probabilities." The latest text-book is 
Whitworth's "Choice and Chance" (Oxford Press, 1904); but 
none of these works, although very excellent in their manner of 
treatment of the subject, comes up to the French, Scandinavian, 
and German text-books. Nevertheless, some of the most im- 
portant contributions to the whole theory have been made by 
the English statisticians and mathematicians, Crofton, Pearson, 
and Edgeworth. Especially have frequency curA'es and cor- 
relation methods introduced by Professor Karl Pearson been 
very extensively used in direct applications to statistical and 
biological problems. Of purely statistical writers, we may 
mention G. Udny Yule, who has published a short treatise en- 
titled "Theory of Statistics" (London, 1911). Numerous ex- 
cellent memoirs have also appeared in the different English and 
American mathematical journals and statistical periodicals, 
especially in the quarterly publication, Biometrika, edited by 
Professor Karl Pearson. 

In the above brief sketch, we have only mentioned the most 
important contributors to the theory of probabilities proper. 
Numerous able writers ha-\'e written on the related subject of 
least squares, the mathematical theory of statistics and insurance 
mathematics. We shall not discuss the works of these inves- 
tigators at the present stage. Each of the most important works 
in the above mentioned branches will receive a short review in 
the corresponding chapters on statistics and assurance mathe- 
matics. The readers interested in the historical development of 
the theory of probabilities are advised to consult the special 
treatises on this subject by Todhunter and Czuber.^ 

' After this chapter had gone to jiress I notice that a treatise by the emi- 
nent Enghsh scholar, Mr. Keynes, is being prepared by The Macmillan Co. 
In this connection I wish also to call attention to the recent publication by 
Bachelier (Calcul des probabilites, 1912), a work planned on a broad and 
extensive scale. — A. F. 



CHAPTER III. 

THE MATHEMATICAL THEORY OF PROBABILITIES. 

12. Definition of Mathematical Probability. — " If our positive 
Knowledge of the effect of a complex of causes is such that we 
may assume, a priori, t cases as being equally likely to occur, but 
of which only/, (J < t), cases are favorable in causing the event, 
E, In which we are interested, then we define the proper fraction: 
f/t = p a,s the mathematical probability of the happening of 
the event, E" (Czuber). We might also have defined an a 
priori probability as the ratio of the equally favorable cases to 
the co-ordinated possible cases. 

As is readily seen, this definition assumes a certain a priori 
knowledge of the possible and favorable conditions of the event 
in question, and the probability thus defined is therefore called 
"a priori probability." Denoting the event by the symbol, E, 
we express the probability of its occurrence by the symbol P{E), 
and the probability of its non-occurrence by P{E). Thus if t is 
the total number of equally possible cases and / the number of 
favorable cases for the event, we have: 



and 



P{E) = j=V> 



P{E) = ^= 1 - I = 1 - p = 1 - P{E). 



This relation evidently gives us: P(£) + P(£) = 1, which is the 
symbolic expression for the hypothetical disjunctive judgment 
that the event E will either happen or not happen. If / = t, we 
have: 

P(E) = 1=1, 

which is the symbol for the hypothetical judgment that if A 
exists, E will surely happen. Similarly if / = 0, we get 
3 17 
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P{E) = 7=0, 

or the symbol for the hypothetical judgment: If A exists, E will 
not happen, or what is the same, E will happen. 

As we have already mentioned, in an a priori determination of 
a probability, special stress must be laid upon the requirement 
that all possible cases must be equally likely to occur. The 
enumeration of these cases is by no means so easy as may appear 
at first sight. Even in the most simple problems wh^re there 
can be doubt about the possible cases being equally likely to 
occur, it is very easy to make a mistake, and some of the most 
eminent mathematicians and most acute thinkers have drawn 
erroneous conclusions in this respect. We shall give a few ex- 
amples of such errors from the literature on the subject of the 
theory of probabilities, not on account of their historical interest 
alone, but also for the benefit of the novice who naturally is ex- 
posed to such errors. 

13. Example 1. — An Italian nobleman, a professional gambler 
and an amateur mathematician, had, by continued observation 
of a game with three dice, noticed that the sum of 10 appeared 
more often than the sum of 9. He expressed his surprise at this 
to Galileo and asked for an explanation. The nobleman re- 
garded the following combinations as favorable for the throw of 9: 

1 2 6 

1 3 5 

1 4 4 

2 2 5 

2 3 4 

3 3 3 

and for the throw of 10 the six combinations of: 

1 3 6 

1 4 5 

2 2 6 
2 3 5 

2 4 4 

3 3 4 
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Galileo shows in a treatise entitled " Considerazione sopra il 
giuco dei dadi" that these combinations cannot be regarded as 
being equally likely. By painting each of the three dice with 
the different color it is easy to see that an arrangement such as 
12 6 can be produced in 6 different ways. Let the colors be 
white, black and red respectively. We may then make the 
following arrangements : 



aite 


: Black 


Red 


1 


2 


6 


1 


6 


2 


2 


1 


6 


2 


6 


1 


6 


1 


2 


6 


2 


1 



which gives 3! = 6 different arrangements. The arrangements 
of 1 4 4 can be made as follows : 

White Black Red 
1 4 4 

4 1 4 
4 4 1 

which gives 3 different arrangements. The arrangements of 
3 3 3 can be made in one way only. By complete enumeration 
of equally favorable cases we obtain the following scheme: 

Sum 9 cases Sum 10 cases 

1,2,6 6 1,3,6 6 

1, 3, 5 6 1, 4, 5 6 
1,4,4 3 2,2,6 3 

2, 2, 5 3 2, 3, 5 6 
2,3,4 6 2,4,4 3 
3,3,3 J. 3,3,4 2_ 

25 27 

The total number of equally possible cases by the different ar- 
rangements of the 18 faces on the dice is 6^ = 216. The prob- 
ability of throwing 9 with three dice is therefore gVe, of throwing 
in — -2-7- = 1 
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14. Example 2.— D'Alembert, the great French mathematician 
and natural philosopher and one of the ablest thinkers of his 
time, assigned f as the probability of throwing head at least 
once in two successive throws with a homogeneous coin. D'Alem- 
bert reasons as follows: If head appears first the game is 
finished and a second throw is not necessary. He therefore gives 
as equally possible cases (we denote head by H and tail by T) : 
H, TH, TT, and determines thus the probability as f . Where 
then is the error of D'Alembert? At first glance the chain of 
reasoning seems perfect. There are altogether three possible 
cases of which two are in favor of the event. But are the three 
cases equally likely? To throw head in a single throw is evi- 
dently not the same as to throw head in two successive throws. 
D'Alembert has left out of consideration the fact that a double 
throw is allowed. The following analysis shows all the equally 
possible cases which may occur: 

IIH, HT, TH, TT. 

Three of those cases favor the event. Hence we have: 

PiE) = p = f . 

We shall return to this problem at a later stage under the dis- 
cussion of the law of large numbers. 

The examples quoted have already shown that the enumer- 
ation of the equally likely cases requires a sharp distinction 
between the different combinations and arrangements of ele- 
ments. In other words, the solution of the problems requires 
a knowledge of permutations and combinations. We assume 
here that the reader is already acquainted with the elements and 
formulas from the combinatorial analysis and shall therefore 
proceed with some more illustrations. In the following, when 
employing the binomial coefficients, we shall use the notation 



1,1 instead of "'Ck- 



15. Example 3. — An urn contains a white and b black balls. A 
person draws k balls. What is the probability of drawing a 
white and /3 black balls? 

{a + 13 = Ic, a ma, P^h) 
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A- balls may be drawn from the urn in as many ways as it is possible 
to select k elements from a + b elements, which may be done in 

(a+b\ _ (a+b\ 

ways. Furthermore there are I I groups of a white and I ^ I 

groups of /3 black balls. Since each combination of any one 
group of tjie first groups with any one group of the second groups 
is favorable for the event, we have as favorable cases: 

. /a\ ib\ „ „ \a} ^ \b} 

Xa + p) 

Example 4. A special case of the above problem is the fol- 
lowing question which often appears in the well known game of 
whist. What are the respective chances that 0, 1, 2, 3, 4 aces 
are held by a specified player? There are altogether 52 cards 
in the game equally distributed among 4 players. Of these 
cards 4 are aces and 48 are non-aces. Hence we have the fol- 
lowing values for a, b, k, a and /3. 

a = 4, 6 = 48, />; = 13, a = 0, 1, 2, 3, 4, /3 = 13, 12, 11, 10, 9. 

Substituting in the above formula we get: 

82251 



P''=(o)x(l3)^(l3) = 

/4\ /48\ /52\ 1 
?'^=(l)x(l2)^(l3) = 2- 

/4\ /48\ /52\ t 
^=(2)x(ll)^(l3) = 2 

/4\ /48\ /52V ] 



270725' 
A hypothetical disjunctive judgment immediately tells us that in 
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a game of whist a specified player must either hold 0, 1, 2, 3 or 
4 aces. Any such judgment is certain to come true. Hence by 
adding the 5 above computed probabilities we obtain a check 
for the accuracy of our calculations. The actual addition of the 
numerical values of po, pi, p-i, pz, and pi gives us unity which is 
the mathematical symbol for certainty. Gauss, the renowned 
German mathematician and astronomer, was an eager whist 
player. During his forty-eight years of residence in the university 
town of Gottingen almost every evening he played a rubber of 
whist with some friends among the university professors. He 
kept a careful record of the distribution of the aces in each 
game. After his death these records were found among his 
papers, headed "Aces in A^^list." The actual records agree 
with the results computed above. 

16. Example 5. — An urn contains n similar balls. A part of 
or all the balls are drawn. What is the probability of drawing 
an even number of balls? 

One ball may be drawn in as many ways as there are balls, 
two balls in as many ways as we may select two elements out of 
n elements, and so on. Hence we have for the total number of 
equally possible cases: 

We have now: 

('+')"='+n+(")+-+(:). 

and 

<i-'>"-'-(n+(;')--+(- ')•(:)■ 

The number of favorable cases is given by the expansion: 



/=(")+(:;)+ 



The expression for t is the binominal coefficients less unity. 
Hence we have: 

i = (1 + 1)» - 1 = 2" - 1. 

If we add the two expansions of (1 + 1)" and (1 — 1)" and then 
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subtract 2 we get the expansion for 2/. Hence we have : 

■2f = [(1+ 1)" + (1 - 1)" - 2] .-. / = 2"-i - 1. 

Thus we shall have as the probabihty of drawing an even number 
of balls: 

On— 1 2 

while for an uneven number: 

Oi«-l 

9= 1-P = ^^^- 



We notice that the probabihty of drawing an uneven nimiber of 
balls is larger than the probability of drawing an even number. 
This apparently strange result is easUy explained without the 
aid of algebra from the fact that when the urn contains one ball 
only, we cannot draw an even number. Hence we have p = 0, 
5=1. With two balls we may draw an uneven number in two 
ways and an even number in one way, thus p = f . and q = ^■ 
The greater weight of q remains when n is finite; only when 

n = X, p = q = ^_ 

17. Example 6. — A box contains n balls marked 1, 2, .3, • • • n. 
A p>erson draws n balls in succession and none of the balls thus 
drawn is put back in the m-n. Each drawing is consecutively 
marked 1, 2, 3, • • • " on n cards, ^"hat is the probability that 
no ball marked a (a = 1, 2. 3. ■ ■ ■ n) appears simultaneously 
with a drawing card marked a? 

The number of equally jwssible cases is simply the number of 
permutations of ?! elements which is equal to n! 

The niunber of favorable cases is given by the total number 
of derangements or relative permutations of ;( elements, i. e., 
such permutations wherein the numbers from 1 to n do not app>ear 
in their natural places. The formula for such relative permuta- 
tions was first given by Euler in a memoir of the St. Petersburg 
Academy entitled 'Quaestio Curiosa ex Doctrina Combina- 
tionis."' Euler makes use of a recursion formula. A German 
mathematician, Lampe, has, however, derived the formula in a 
simpler manner in "Grunert's Archives ' for 1S'>4. 
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Lampe denotes bj"^ the symbol (p{l) the number of permuta- 
tions wherein 1 does not appear in its natural place. By letting 1 
remain fixed in the first place we obtain (n — 1) ! permutations of 
the other remaining elements, or: 

<pil)„ = nl- (n- 1)! 

permutations where 1 is out of place. Of these permutations 
there are, however, a number wherein 2 appears in its natural 
place. If we let 2 remain fixed in this place we shall have: 

^(l)„_i= {n - 1)1 - in - 2)1 

permutations wherein 2 is in its place but 1 out of place, there 
remains thus: 

<p{2)n = <p{l)n - ^(l)n-i = n! - 2(n - 1)! + (n - 2)! 

permutations in which neither 1 nor 2 is in its natural place. 
Letting 3 remain fixed in its place, the remaining n — 1 elements 
give: 

(re- 1)! - 2(?i- 2)!+ (n- 3)! 

cases where 3 is in its place but 1 and 2 are not. Accordingly 
there will be: 

^(3)„ = ^(2)„ - ^(2)„_i = n! - 3(re- 1)!+ 3(n- 2)! -(n-3)! 

permutations in which none of the three elements 1, 2, and 3 is 
in its place. The complete deduction gives us now for the 
number r: 

<p{r)n = n! - ([) (n - 1)! + (2) (w - 2)! 

+ {-iy(\){n-r)\ 

arrangements in which none of the numbers 1, 2, 3, • • • r is in 
its place. Hence the required probability is: 

<p(r)n _ _ /nl , /n 1 

n\ \\tn^\2tn{n-l) 

■^ ^~ ^^\r)n(ji- 1) ••• (n- r-l- 1)' 
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when n = r the above expression becomes: 

or the probability that none of the balls appear in its numerical 
order. 

^Vhen 71 = oc the above expression converges towards e~^ as 
a limit. Since the series is rapidly convergent, we may therefore 
as an approximate value let 

p = e-i= 0.36788 ■■•. 

The probability that at least one ball appears in numerical order 
is 

q= I- p= 0.63213 •••. 



CHAPTER IV. 

THE ADDITION AND MULTIPLICATION THEOREMS IN 
PROBABILITIES. 

18. Systematic Treatment by Laplace. — The reader will readily 
have noticed that the problems hitherto considered have been 
solved by a direct application of the fundamental definition of a 
mathematical probability. Almost every branch of pure and 
applied mathematics has originated in this manner. A few 
isolated problems, apparently having no mutual connection what- 
soever, have presented themselves to different mathematicians. 
As the number of problems increased, there was found to exist 
a certain inner relation between them, and from the mere isolated 
cases there grew a systematic treatment of an entirely new 
subject. 

The theory of probabilities had its origin in games; and the 
different problems that arose, were treated individually. From 
the time of Galileo and Cardano to the appearance of Laplace's 
great treatise, a number of celebrated mathematicians such as 
Pascal, Fermat, Huyghens, De ]\Ioivre, Stirling, Bernoulli and 
others had solved numerous problems, some of these, as we already 
have seen in the preceding chapter, of a quite complex nature. 
But none of these mathematicians had hitherto succeeded in 
giving a systematic treatment of the subject as a whole. All 
their treatises were, as any one taking the trouble to look over 
the works of De Moivre and Bernoulli will readily notice, mere 
collections of examples solved by direct application of our funda- 
mental definition. It remained for Laplace first to give the 
definite rules to the science bj^ which the solution of a great 
number of problems, often very complicated, was reduced to 
the application of a few stable principles, first given in his 
"Theorie Analytique des Probabilites " (Paris, 1812). 

19. Definition of Technical Terms. — Before entering into a 
demonstration of Laplace's theorems it will, however, be neces- 
sary to explain a few technical terms which seem commonplace 

26 
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and simple enough but which, nevertheless, must be defined 
clearly in order to avoid any ambiguity. 

In all works on probabilities when speaking of happenings of 
various events we encounter often the terms, independent events, 
dependent events and mutually exclusive events. An event E is 
said to be independent of another event F when the actual 
happening of F does not influence in any degree whatsoever the 
probability of the happening of E. On the other hand, if the 
probability of E is dependent on or influenced by the previous 
happening of F, then E is said to be dependent on F. Finally the 
two events E and F are said to be mutually exclusive when 
through the occurrence of one of them, say F, the other event 
E cannot take place, or \'ice versa. We might also in this case 
consider the two events E and F as members of a complete dis- 
junction. In a complete hypothetical disjunctive judgment as 
"^Mien a die is thrown either 1, 2, 3, 4, 5 or 6 will turn up" 
each member represents a possible event. Any one of these 
events is mutually exclusive in respect to the other events of the 
disjunction. 

20. The Theorem of the Complete or Total Probability, or the 
Probability of " Either Or." — When an event, E, may happen in 
any one of the n different and mutually exclusive ways -Ei, E^, 
Ez, ■ ■ ■ En with the respective probabilities: pi, p^, pz, ■ ■ ■ Pn, 
then the probability for the happening of the event, E, is equal 
to the sum of the individual probabilities: pi, Pi, pz, ■ ■ ■ p„. 

Proof: The main event, E, falls in n groups of subsidiary events 
of which only one can happen in a single trial but of which any 
one will bring forth the event E. Let us by t denote the total 
number of equally possible cases. Of these possible cases / are 
in favor of the event. This favorable group of cases may now 
be divided into n sub-groups of which / are favorable for the 
happening of Ei, ji in favor of Ei, /s in favor of Ez ■ ■ ■ fn in 
favor of En- ^Mien we write: 

/ /1 + /2+/3+ •••+/. /l,/2 

P{E) = P = ^ = ^ =7 + 7 

4-tl \ j_ ^ 

"^ t'^"''^t' 
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Each of the fractions fjt (a = 1, 2, 3, ■ ■ ■ n) represents the 
respective probabilities for the actual occurrence of the subsidiary 
events, Ei, E2, E3, ■ ■ ■ En- Hence we shall have 

P{E) = p= Pl+ P2+ P3+ ■■• + Pn- 

This theorem is also known as the Addition Theorem of proba- 
bilities. Instead of "total probability" the German scholar, 
Reuschle, has suggested the expressive name of the "either or" 
probability. The term is well selected when we remember that 
the event, E, will happen when either Ei, or E2 or E3 ■ ■ ■ or jB„ 
happens. 

Example 7. — What is the probability to throw 8 with two dice 
in a single throw? 

The total number of ways is < = 6^ = 36. The event in ques- 
tion E is composed of the three subsidiary events favoring the 
combination of 8: 

Ei: 6, 2 

E2: 5, 3 



Now 



Er. 4, 4. 



21 1 2! 1 1 

^^-^^^ = 36=18' ■^^■^^^ = 36 = 18' -^^^') = 36- 



Hence 



Pr,^^ = l + l + l=A 
^ '' 18 "^18 "^36 36" 



21. Theorem of the Compound Probability or the Probability 
of " As Well As." — An event E may happen when every one of 
the mutually exclusive events Ei, E2, E3, • > • En has occurred 
previously. It is immaterial if the n subsidiary events have 
happened simultaneously or in succession. But it makes a 
difference if the events Ei, E2, E3, • • ■ En are independent, or 
dependent on each other. 

1. Independent Emntt. — The probability, P{E) = p, for the 
simultaneous or consecutive appearance of several mutually ex- 
clusive events: 7?i, E2, • • • -E„ is equal to the product: pi-p^-ps- 
•• • Pn of the individual probabilities of the n events. 

Proof: Let the number of possible cases entering into the 
complex that brings forth the event E be t. Each of the ti 
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possible cases corresponding to the event Ei may occur simul- 
taneously with each one of the U cases corresponding to the event 
E2. Thus we have altogether h X <2 cases falling on Ei and E2 
at the same time. Continuing in the same way of reasoning it 
is readily seen that the total number of equally possible cases 
resulting from the simultaneous occurrence of the events Ei, E2, 
E3, •■■■En is equal to <i X ^3 X <3 X • • • tn- By applying the same 
reasoning to the favorable cases we get as their total number: 

/ = /iX/2X/3X •••/„. 

Hence the final probability for the happening of the simultaneous 
or consecutive appearance of the n minor events is: 

P(£) = 7 = 7xf'x{-X •••f = P1XP2XP3X ■■■pn. 
I Ii I2 H tn 

Example 8. — A card is drawn from a whist deck, another card 
is drawn from a pinochle deck. What is the probability that 
they both are aces? 

A whist deck contains 52 cards of which four are aces, a 
pinochle deck 48 cards with 8 aces. Denoting the probabilities 
of getting an ace from the whist and pinochle decks by P(Ei) 
and P{E2) respectively we have: 

P{E) = P{E,)P{E2) =^X~ = ~. 

2. Dependent Events. — The n events Ei, Ei, Ez, • • • -E„ are 
not independent of each other, but are related in such a. way that 
the appearance of E\ influences E2, that event influences in turn 
Ez, Ez event Ei and so on. 

The same reason holds as above, and, 

PiE) = 2> = pi X 2J2 X P3 X ■■■ Pn. 

But p2 means here the probability for the happening of E2 after 
the actual occurrence of Ei, pz the probability for the happening 
of Ez after Ei and E2 have pieviously happened, and so on for 
all n events. 

Example 9. — A card is drawn from a whist deck and replaced 
by a joker, and then a second card is drawn. What is the prob- 
abilitv that both cards are aces? 
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Denoting the two subsidiary events by Ei and E2 we have: 

4 3 3 3 



P{E) = P{E0PiE2) = 



52 52 13 X 52 676 ' 



The two above theorems are known as the multiplication theorems 
in probabilities. Reuschle has also suggested the name " the 
as well as probability." 

22. Poincare's Proof of the Addition and Multiplication 
Theorem. — The French mathematician and physicist, H. 
Poincare, has derived the above theorems in a new and elegant 
manner in his excellent little treatise: " Lecons sur le Calcul des 
Probabilites," Paris, 1896. 

Poincare's proof is briefly as follows: 

Let El and E2 be two arbitrary events. 
El and E2 may happen in a different ways. 
El may happen but not E2 in /3 different ways. 
E2 may happen but not Ei in y different ways. 
Neither Ei nor E2 will happen in 5 different ways. 
We assume the total a + |8 + 7 + 5 cases to be equally likely to 

occur. 
The probabihty for the occurrence of Ei is 

^' a + p + y + d- 
The probability for the occurrence of E2 is 
a + 7 

The probability for the occurrence of at least one of the events Ei 
and E2 is 

a + g + 7 
^' a + ^ + y + d- 

The probability for the occurrence of both Ei and E2 is 

a 

P'^ a + ^ + y+8- 

The probability for the occurrence of Ei when E2 has already oc- 
curred is 

a 

a + y 
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The probability for the occurrence of E2 when Ei has already oc- 
curred is 

a 

P6 = 



The probability for the occurrence of Ei when £2 has not already 
occurred is 

The probability for the occurrence of E2 when Ei has not already 
occurred is 



Ps = 



We have now the following identical relations: 

Pl + P2= P3 + Pi, P3 = Pl+ P2 — Pi, 

i. e., the probability that of two arbitrary events at least one 
will happen is equal to the probability that the first will happen 
plus the probability that the second will happen less the prob- 
ability that both will happen. The particular problem which 
we may happen to investigate may possibly be of such a nature 
that the two events Ei and E2 cannot happen at the same time, 
in that case pi = 0, and we get : 

Ps = Pi + Pi- 

In this equation we immediately recognize the addition theorem 
for two mutually exclusive events. By substitution of the 
proper values we have furthermore: 

Pi= P2 ■ Pi or ^4 = pi • P6- 

These equations contain the theorems proved under § 21, of 
the probability for two mutually dependent events. 

23. Relative Probabilities. — We shall now finally give an alter- 
native demonstration of the same two theorems. It will, of 
course, be of benefit to the student to see the subject from as 
many view points as possible; moreover, the following remarks 
will contain some very useful hints for the solution of more com- 
plicated problems by the application of so-called " relative prob- 
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abilities "and a few elementary theorems from the calculus of 
logic. The following paragraphs are mainly based upon a 
treatise in the Proceedings of the Royal Academy of Saxony, 
by the German mathematician and actuary, F. Hausdorff. 

In our fundamental definition of a mathematical probability 
for the happening of an event E, expressed in symbols by P{E), 
as the ratio of the equally favorable and equally possible cases 
resulting from a general complex of causes, we were able to 
compute the so-called ordinary or absolute probabilities. But 
if we, from among the favorable cases and possible cases, select 
only such as bring forward a certain different event, say F, then 
we obtain the " relative probability " for the happening of E 
under the assumption that the subsidiary event, F, has occurred 
previously. For this relative probability we shall employ the 
symbol Pp{E), which reads "the relative probability of E, 
positi F." The following problem illustrates the meaning of 
relative probabilities. If an honor card is drawn from an 
ordinary deck of cards, what is the probability that it is a king? 
Denoting the subsidiary event of drawing an honor card by F, 
and the main event of drawing a king by E, we may write the 
above mentioned probability in the symbolic form: Pip{E). If 
on the other hand we knew a priori that a king was drawn, we 
may also ask for the probability of having drawn an honor card. 
Since any king also is an honor card, we may write in symbols: 
P^{F) = 1. 

Before entering upon the immediate determination of relative 
probabilities we shall first define a few symbols from the calculus 
of logic. We denote first of all the occurrence of an event E 
by E, the non-occurrence of the same event by E. Similarly 
we have for the occurrence and non-occurrence of other events, 
F, G, H, ■ ■ ■ and F, G, II, ■■■. E + F means that at least one 
of the two events E and F will happen. E X F or simply E ■ F 
means the occurrence of both E and F. From the above 
definition it follows immediately that E -\- F = E ■ F and 
E= E ■ F^- E -F. 

This last relation simply states that E will happen when either 
E and F happen simultaneously or when E and the non-appear- 
ance of F happen at the same time. If furthermore F\, Fi, F2, 
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I'\ ■ • • Fn, F„ constitute the members of a complete disjunction, 
i. e., mutually exclusive events, we have in general: 

E = E- F^-^E-F, + E- F. + E-F.+ ■■■E-F,, + E J\. 

From the original definition of a probability, it follows now : 

P{E) = P{E ■ F) + P{E ■ F), 
and 

P{E) = P{E ■ F{) + P{E ■ Fi) + PiE ■ F2) + P{E ■ ¥2) 

+ P{E ■ Fn) + PiE ■ ¥\), 

i. e., the probability that of several mutually exclusi-\'e events 
one at least will happen is the sum of the probabilities of the 
happening of the separate events. This is the symbolic form for 
the addition theorem. 

24. Multiplication Theorem. — We next take two arbitrary 
events. From these events we may form the following com- 
binations : 

E ■ F, E ■¥,£ ■ F, E -¥,'1.^., 
Both E and F happen, 
E happens but not F 
F happens but not E 
Neither E nor F happens. 

Furthermore let a, j3, 7, 5, be the respective numbers of the 
favorable cases for the above four combinations of the events 
E and F. Following the previous method of Poincare, we shall 
have: 

^ ' a + ^ + T + 5' ^ ' a + )3 + -y + 5' 

nE.F)= ^^^l^^^ . 

25. Probability of Repetitions. — From the above equations it 
immediately follows: 

P{E ■ F) = PiE) X PEiF) = PiF) X PAE), 

which is the symbolic form for the multiplication theorems of 
compound probabilities. 
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In special cases it may happen that the different subsidiary 
events : Ei, E2, E^ ■ ■ ■ En are all similar. We shall then have, 
following the symbolic method: 

E = El- E,- Ei--- En = E,- El- El--- Ei = Ei", 
and 

P{E) = P(£i") = PC^i)"- 

This gives us the following theorem: 

The probability for the repetition n times of a certain event, 
E, is equal to the wth power of its absolute probability. 

Thus if P{E) = p we have immediately P{E) = 1 — p. 

PiE"") = P{EY = f\ 
P{E'') = P[E)^ = (1 - pY. 

Thus the probability for the occurrence of E at least once in 
n trials is 

P(^ + £ + --- n times) = 1 - P(^") = 1 - (1 - p)". 

Denoting the numerical quantity of this probability by Q we 
have: 

1 - Q = (1 - pY. 

Solving this equation for n we shall have: 

log (1 - Q) 



n = 



log (1 - p) ■ 



Whenever n equals, or is greater than, the above logarithmic 
value for given values of Q and p we are sure that Q will exceed 
a previously given proper fraction. To illustrate: 

Example 10. — How often must a die be thrown so that the 
probability that a six appears at least once is greater than |? 

Here p = \, Q = \- Hence we must select for n the smallest 
positive integer satisfying the relation: 



log (1 - i) log 1 .301035 



log (1 - I) log f .079186 
For this particular value, of n we have in reality 
Q = 1 - (1)^ = .518. 



i. e., n = 4. 
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26. Application of the Addition and Mtdtiplication Theorems 
in Problems in Probabilities. — We shall next proceed to illustrate 
the theorems of the preceding paragraphs by a few examples. 
First, we shall apply the demonstrated theorems to some of the 
examples we have already solved by a direct application of the 
fundamental definition of a mathematical probability. 

Example 11. — We take first of all our old friend, the problem 
of D'Alembert. What is the probability of throwing head at 
least once in two successive throws with an uniform coin? 

This problem is most easily solved by finding the probability 
first for not getting head in two successive throws. By the 
multiplication theorem this probability is: p = 2 X 2 = i- 
Then the probability to get head at least once is 1 — J = f 
from a simple application of the rule in § 25. A more lengthy 
analysis is as follows. Denoting the event by E, the following 
cases may appear which may bring forth the desired event: 
Head in first throw which we shall denote by Hx and head in 
second throw which we denote by H2, or head in first throw {H^ 
and tail in second {Ti), or finally tail in first (Ti) and head in 
second {H^). Then we have: 

E= Hr- H2+H,- T,+ Tx- H2, 
or: 



P{E) = P(Fi) ■ PiH,) + P(i7i) ■ PiT2) + P(fi) • P{H2) 

— 2^2I2'^2I2'^2 — 4' 



27. Example 12. — What is the probability of throwing at 
least twelve in a single throw with three dice? The expected 
event occurs when either 12, 13, 14, . . . or 18 is thrown. Of 
these events only one may happen at a time. We may, there- 
fore, apply the addition theorem and obtain as the total prob- 
ability: 

p = pu + Pu + Pu+ ■ ■ ■ + Pis- 

where pu, Pn, ■ • ■ Pis are the respective probabilities for throwing 
the sums of 12, 13, • • • or 18. These subsidiary probabilities 
were determined in § 13 under the problem of Galileo, and: 

P = 2¥6 + 2¥6 +^^6 + A°6 + l^fe + ^6 + jh = II- 



36 THE ADDITION AND MULTIPLICATION THEOREMS. [2.^ 

28. Example 13. — An urn contains a white, b black and c red 
balls. A single ball is drawn a + /3 + 7 times in succession, 
and the ball thus drawn is replaced before the next drawing takes 
place. To determine the probability that (1) there are first 
a white, then /3 black and finally 7 red balls, (2) the drawn balls 
appear in three closed groups of a white, (3 black and 7 red balls, 
but the order of these groups is arbitrary, (3) that white, black 
and red balls appear in the same number as above, but in any 
order whatsoever. 

1. Denoting the three subsidiary events for drawing a white, 
j3 black and 7 red balls by Fi, F2 and F3, and the main event for 
drawing the balls in the prescribed order by E, we may write the 
probability for the occurrence of the main event in following 
symbolic form involving symbolic probabilities: 

Substituting the algebraic values for P{Fi), P{Fi) and PiFz) 
in the expression for P{E), and then applying Hausdorff's rule 
(§ 24) we get: 

a°- h^ c^ 

P{E) = pi = (^ ^ 5 ^ ^Y X (a + 6 4- c)» >< (a + 6 + cy 



{a+b + c)»+^+i' • 



2. In the second part of the problem the order of the three 
different groups is immaterial. The three subsidiary events: 
Fi, Fi and F3, may therefore be arranged in any order whatsoever. 
The total number of arrangements is 3! = 6. The probability 
of the happening of any one of these arrangements separately 
is the same as the probability computed under (1). By applying 
the addition theorem we get therefore as the probability of the 
occurrence of this event: 

^' " (a+6 + c)»+«+v 

3. The third part is more easily solved by a direct application 
of the definition of a mathematical probability. The order of 
the balls drawn is here immaterial. Of each individual com- 
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bination of a white, /S black balls and 7 red balls it is possible 
to form (a + |3 + y)lja\^\y\ different permutations as the total 
number of favorable cases. The above number of equally pos- 
sible cases is here (a + 6 + c)""''^"'"'''. Hence we have: 

(a + ^ + 7) ! ,, a'^h^c' 

P3 = -T^r-, X 



a\^ly\ '^ (a+ &+c)''+^+^" 

29. Example 14. — In an urn are n balls among which are a 

white and /3 black. What is the probability in three successive 

drawings to draw (1) first two white and then one black ball, (2) 

two white and one black ball in any order whatsoever? (a+jS^n). 

The probability to draw first one white, then another white and 

finally a black ball is: 

a (g - 1 ) ^ /3 
Vt. = -7Z inX 



n(n- I) {n - 2) •* 

The probability for any of the other arrangements is the same, 
or we have for (2) 

3q! (a - 1) /3 

P2 = 3pi = — 7 -TT X 7 ^ . 

•^ -^ n {n — \) (n — 2) 

30. Example 15. — What is the chance to throw a doublet of 
6 at least once in n consecutive throws with two dice? (Pascal's 
Problem.) 

Chevalier de Mere, a French nobleman and a great friend of all 
games of chance, went more deeply into the complex of causes in 
different games than most of the ordinary gamblers of his time. 
Although not a proficient mathematician he understood suffi- 
cient, nevertheless, to give some very interesting problems for 
which he got the ideas from the gambling resorts he frequented. 
De Mere was a friend of the great French mathematician and 
philosopher, Blaise Pascal, and went to him whenever he wanted 
information on some apparently obscure point in the different 
games in which he participated. The chevalier had from patient 
observation noticed that he could profitably bet to throw a six 
at least once in four throws with a single die. He reasoned now 
that the number of throws to throw a doublet at least once with 
two dice ought to be proportional to the corresponding equal 
number of possible cases with a single die. For one die there are 
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6 possible cases, for two 36. Thus de Mere thought he could 
solely bet to throw a doublet of 6 in 24 throws with two dice. 
An actual trial by several games of dice proved extremely 
disastrous to the finances of the nobleman, who then went to 
Pascal for an explanation. Pascal solved the problem by a direct 
application of the definition of a mathematical probability. We 
shall, however, solve it by an application of the multiplication 
theorem. 

The probability to get a doublet of 6 in a single throw is ^^g. 
The probability of not getting a double six is therefore 1 — jg 
= fg. The probability of the happening of this event n con- 
secutive times is (ff )". Thus the probability of getting a double 
six at least once in n throws with two dice becomes: p = 1 — 
(ff)"- Solving this equation for n we shall have: 

^ log (1 - p) 
log 35 — log 36 ' 
for p = ^ we shall have: 

_ log 2 _ 

log 36 — log 35 

First for 25 throws we may bet safely one to one while for 24 
throws such a bet was unfavorable. This shows the fallacy of 
de Mere's reasoning. 

31. Example 16. — An urn, ^4, contains a balls of which a are 
white, another similar urn, B, contains- b balls of which /3 are 
white. A single ball is drawn from one of the two urns. What 
is the probability that the ball is white? The beginner may easily 
make the following error in the solution of this problem. The 
probability to get a white ball from A is a/a, from B, P/b. Thus 
the total probability to get a white ball is : a/a + fi/b. This 
result is, however, wrong, for we may, by selecting proper values 
for a, b, a and P, obtain a total probability which in numerical 
value is greater than unity. Thus if a — 1, 6=7, a = 5, 
/3 = 4, we get as the total probability: 

^ — 6 I 4 _ 9 

This result is evidently wrong, since a mathematical probability 
is never an improper fraction. The error lies in the fact that we 
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have regarded the two events of drawing a ball from either urn 
as independent and mutually exclusive. A simple application 
of the symbolic rule for relative probabilities will give us the 
result immediately. The main event, E, is composed of the two 
following subsidiary events: (1) to get a white ball from A, or 
(2) to get a white ball from B. We shall symbolically denote 
these two events by A ■ W and B ■ W respectively. Thus we 
have: 

P{E) = PiA-W) + P{B-W) = P(A)P^(W) + PiB)Ps(W). 

Now the probabihty to obtain urn A is P(A) = Pi = 2. also to 
get B: P(B) = P2 = |. The probability to get a white ball 
from A when this particular urn is previously selected is expressed 
by the relative probability: 

Similarly for B: 

Pb{W) = P4 = f . 

Substituting these different values in the expression for P(-E) 
we get finally: 

For the particular numerical example we have: 

= 1/5 4\^^ 
^ 2\7^7l 14" 

32. Example 17. — The probability of the happening of a 
certain event, E, is p, while the probability for the non-occurrence 
of the same event is q = 1 — p. The trial is now to be repeated 
n times. The probability that there will be first a successes 
and then /3 failures is: 

P{E'')Pe'^ {&) = p'^ ■ q^ia + P= n). 

This is the probability that the two complementary events E and 
E happen in the order prescribed above. When the order, in 
which the successes and failures happen, plays no role during 
the n trials, that is to say it is only required to obtain a successes 
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and j3 failures in any order whatsoever in n total trials, then the 
arrangement of the a factors p and /3 factors q is immaterial. The 
total number of arrangements of n elements of which a are equal 
to p and (3 equal to q is simply n\/{al X /S !) . For any one particu- 
lar arrangement of a factors p and /3 factors q the probability of 
the happening of the two complementary events in this particular 
arrangement is equal to p" ■ q^. The Addition Theorem im- 
mediately gives the answer for a successes and /3 failures in any 
order whatseover as: 



p{E--E^) = p^ = ["Dv^r- 



Let us, for the present, regard this probability as being a function 
of the variable quantity, a, {n being a constant quantity). We 
may then write: 

Va = <p{oi). 

Letting a assume all positive integral values from to re the 
above expression for p^ becomes: 



/n\p^- 



n—l 



'^ ■ g"-^ • • • Pn = v^- 



These are the respective probabilities for no successes, one success, 
two successes, . . . and finally n successes in n trials. The 
above quantities are, however, merely the different members of 
the binomial expansion {p + g)". Since p -\- q = \ from the 
nature of the problem, we also have (p + g)" = 1, or po + Pi 
+ P2 + • ■ • -\- Pn = 1. This last equation is the symbolic 
form for the simple hypothetical disjunctive judgment: E must 
happen either 0, 1, 2, • • • or re times in re total trials. We shall 
return to this problem later under the discussion of the BernouUian 
Theorem. In fact, the above example constitutes an essential 
part of this famous theorem which has proven one of the most 
important and far reaching in the whole theory of probability. 
33. Example 18. De Moivre's Problem. — The following prob- 
lem was first given by the eminent French-English mathemati- 
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cian, Abraham de Moivre, in a treatise, entitled " De Mensura 
Sortis," which was pubhshed in London about 1711. 

An urn contains n + 1 balls marked 0, 1, 2, • • • n. A person 
makes i drawings in succession, and each ball is put back in the 
urn before the next drawing takes place. What is the probability 
that the sum of the numbers on the n balls thus drawn equals s? 

The first ball may be drawn in « + 1 ways, the second ball 
may also be drawn in n + 1 ways. Hence two balls may be 
drawn in (n + 1)^ ways or i balls in (n + 1)^ ways: This is the 
total number of equally possible cases. 

If we expand the expression: 

(a;» + .ri + x^ + 3^ + x* + ■■■ a:")' (1) 

after the multinominal theorem, we notice that the coefficient 
to x° arises out of the different ways in which 0, 1, 2, 3, • • • re 
can be grouped together so as to form s by addition, which also 
is the total number of favorable cases. The expression (1) 
inside the bracket represents a geometrical progression, which 
may be written as: 

(1 _ a;"+i)'(l - x)-' = 1 - ^■a;"+' + (2)^'""^' ~ (3) a;5"+' 

+ ...}x{i + «+('+')-'+('D>'-+--l- 

By actual multiplication we get a power series in x. The terms 
containing x' are obtained in the following manner : the first term 
of the first factor being multiplied with the term 

(i -I- g 1 \ 
I x' of the second factor, 

the second term of the first factor multiplied with the term: 

(i -\- s — ?i — 2\ 
I ^.s-n-i (jf j.}jg second factor, 
s — n — I I 

the third term of the first factor multiplied with the term: 

(„ I a;«-2n-2 pf ^jjg second factor. 
s -2n-2 I 



42 THE ADDITION AND MULTIPLICATION THEOREMS. [34 

Thus the coefficient of a;' is equal to 

n + s—l\ /i\/i + s — n — 2\ 
\ s }~ \l)\ s- n- 1 I 

+ \2/ \ s-2n-2 )~ 

The above expression may by further reductions be brought to 
the form: 

(^+l)(^+2) ... (s+i-l) 
1 ■ 2 • ■ • (i - 1) 



-(;) 



(*- 


n)(s ■ 


-n+1) ■■• 


is 


— 


n -]- i 


' — 


2) 








1 • 2 ■ ■ ■ (i 


— 


1) 










(s- 


2n- 


■ l){s-2n) ■ 




(.s 


- 2n 


+ 


i — 


3) 



1 • 2 . . ■ (2 - 1) 



The series breaks of course as soon as negative factors appear 
in the numerator. The required probabihty is therefore 

_ 1 f(5+l)(5+2) ■■■ (^+i-l) 



(n + 1)M 1 • 2 • • • (i - 1) 

(s - n)(.? - n + 1) • • • (5 - n + i - 2) 



(1) 



1 2 ■ • • (i - 1) 



34. Example 19. — If a single experiment or observation is 
made on n pairs of opposite (complementary) events, Fi^ and F^ 
with the respective probabilities of happening Pa and g„ (a = 1, 
2, 3, • • • n), to determine the probability that: (1) exactly r, 
(2) at least r of the events 2?,, will happen. 

This problem is of great importance, especially in life assurance 
mathematics. It happens frequently that an actuary is called 
upon to determine the probability that exactly r persons will be 
alive m years from now out of a group of n persons of any age 
whatsoever, each person's age and his individual coefficient of 
survival through the period being known beforehand. 

Various demonstrations have been given of this problem. The 
first elementary proof was probably due to Mr. George King, 
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the English actuary, in his well-known text-book. The Austrian 
mathematician and actuary, E. Czuber, has simplified King's 
method in his " Wahrscheinlichkeitsrechnung " (1903). Eater 
the Italian actuary, Toja, has given an elegant proof in Bolletino 
degli Attuari, Vol. 12. Finally another Italian mathematician, 
P. Medolaghi, has investigated the problem from the standpoint 
of symbolic logic. In the following we shall adhere to the demon- 
stration of Czuber and also give a short outline of the symbolic 
method. 

In order to answer the first part of the problem we must form 
all possible combinations of r factors of p and n — r factors of q 
and then sum all such combinations of n factors. Denoting 
the event by £[,.] we have: 

= 22J„p3 ■ • • (1 - pj(l - pj • • • (1 - pj. 

We shall now denote the sum of all products in (1) containing <p 
factors p by the symbol S^. It is readily seen that cp will have 
all positive integral values from r to n inclusive. We may 
therefore write the total compound probability in the following 
form: 

PiEiri) = .4oS. + A,Sr+l + A,Sr+2 + " " " + An^S„. (2) 

The student must bear in mind that the different S are merely 
symbols for different sums of all the products of r, r + 1, r -f 2, 
■ ■ • n factors p respectively. Our problem is now to determine 
the unknown coefficients A. It is easily seen that the coefficient 
Ao = 1, since all different products containing r factors p appear 
only once. The other coefficients of the form A do not depend 
on the values of p, however. They remain therefore unaltered 
if we equate all of the various p's and let them equal p. Ex- 
pression (1) then simply becomes ( I • p''{l — p)"~''. We must 
form all possible rth powers of n similar factors, which can 
be done in I ) ways. The expression (2) on the other hand 
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becomes: 

+ • • • + Ar^p''. 

Any S^ is by definition the sum of all products containing <p 

factors p and we may form I | such products from n elements 

p. But we saw above that tp might only have all positive values 
from r to /( inclusive, hence expression (2) will naturally take 
the above form. We have therefore 

(';)-/Ai-p)-=(;)-p^+-ii(,;i)p-^ 

Expanding the expression on the left hand side by means of the 
binomial theorem and equating the coefficients of equal powers 
of p, we get: 

or: 

A^ = (- D— ( "" ) . 

\n — rf 

Substituting these values in (2) for the unknown coefiicients A, 
we shall have: 

Pi^lr]) = S, — y ^ j Sr+1 + ( O ) '^'•+2 — • • • 

+ (-1)-' („->"• 
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If we expand the algebraic expression: 
we have: 

+(-l)-'(„", )«■-••■■ 

We may therefore write P{E) = ,.. . „.,_)_^ ,. when every expo- 
nent is replaced by an index number {i. e., S* replaced by S^) 
and the expansion broken off at the term S". The student must 
of course constantly bear in mind the symbolic meaning of S^. 
The second part of the problem is easily solved by the sym- 
bolic method. Denoting this particular event hj Er, we have 
the following identity: 

P{Er) - P{Er+{) = P(£p,) 

or 

P{Er) - P{E,r,) = P{Er+l). 

The following relations are self-evident: 

P(Eo) = 1; 

S" 1 

PiEm) - i^s~ 1 + S' 

P{Ei) = P{Eo) - P{EJ = 1 - jqp^, also; 

__S S S^ 

P{E2) = PiEi) - P{E,,,) - 1 ^ 5 (1 + s)2 - (1 + S)2- 

The complete induction gives us finally: 

_ S' 

P{Er) -^j ^gy. 

Assuming the rule is true for r, we may easily prove it is true 
for r + 1 also. We have in fact: 



(1 + sy (1 + s)^i (1 + s)^i • 
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35. Example 20. Tchebycheff's Problem.— The following solu- 
tion of a very interesting problem is due to the eminent Russian 
mathematician, Tchebycheff, one of the foremost of modern 
analysts. 

A proper fraction is chosen at random. What is the proba- 
bility it is in its lowest terms? 

Stated in a slightly different wording the same question may 
also be put as follows: If A/B is a proper fraction, what is the 
probability that A and B are prime to each other? 

If Pi, Ps, Pi, ■ ■ ■ Pm denote respectively the probabilities that 
each of the primes 2, 3, 5, • • • m is not a common factor of 
numerator and denominator of A/B, then the probability that 
no prime number is a common factor is: 

P = P2 ■ Pi ■ Pi ■ ■ ■ Pm ■ ■ ■ p, ■ ■ ■ a,<i- inf. (I) 

This follows from the multiplication theorem and from the fact 
that the sequence of prime numbers is infinite. 

Tchebycheff now first finds the probability g™ = I — Pm that 
the fraction A/B does contain the prime m as factor of both A 
and B. By dividing any integral number by the prime m we 
obtain besides the quotient a certain remainder that must be 
one of the following numbers, viz.: 

0, 1, 2, 3, 4, • • • (m - 1). 

Each of the above remainders may be regarded as a possible 
event. The probability to obtain as a remainder is accordingly 
1/m. The probability that m is contained as a factor of A is 
therefore 1/m. This same quantity is also the probability that 
m is a factor of B. The probability that both A and B are 
divisible by m is therefore: 



111 . 1 

— ■ — = — 5, I 
m m m- 

Hence we have for the various primes 
1 _ 1 _i 

02 » P^ Q2 



9m=l — Pm = =-^, or Vm—1—2- 



P2=l-^. P3=l-52» 2'6=1-T2, 
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Formula (I) then takes the form: 

^=(l-2^)(l-|')(l-o^)---d.inf. (ID 

Forming the reciprocal 1/P we get: 
1111 



1-^ 1-i 1-1 



• • • ad. inf. 



Now each factor on the right hand side is the sum of a geometrical 
progression, as: 

P"^U+22+(22J2"' )(l + 3i+(32)2+---) 

( 1 + 55 + (52J2 +••■)■■• ad. inf. 

Multiplying out we shall have: 

J._j^ 1 1 1 J^ 

p 1 2 ~r 02 ' 32 "■ 42 ~r c2 I ■ ■ ■ ^d. int. 

The above infinite series is, however, merely the well known 
Eulerian expression for tt^/G, hence: 

Suppose furthermore we were assured that none of the three 
primes 2, 3, 5 was a common factor of both A and B. What 
would then be the probability that the fraction might be reduced 
by division by one or more of the other primes? 

Denoting by the symbol P(7) the probability that none of the 
primes from 7 and upwards is a common factor, we get : 



also: 



^(7) = ( 1 ~ yi) ( 1 ~ 112) ( ^ ~ 132) ■ ■ ■ ^^- ^°^-' 
^ = ^= (1-22) (1-35) (1-55)^7), 
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or: 

Pc.=^^[(l-|)(l-^)(l-^)] = 0.950. , 

The probability of the divisibihty of both numerator and 
denominator of a fraction chosen at random by a prime larger 
than 5 is thus : 

J_ 

20' 



1 - Po) 



The summation of the infinite series of the reciprocals of the 
squares of the natural numbers bafiled for a long time the skill 
of some of the most eminent mathematicians. Jacob Bernoulli, 
the renowned classical writer on probabilities, proved its conver- 
gency but failed to find its sum. The final summation was first 
performed by Euler. 



CHAPTER V. 

MATHEMATICAL EXPECTATION. 

36. Definition, Mean Values. — It is common belief among 
many people that gambling and all kinds of betting have their 
source in reckless desire. This is often argued by moral reform- 
ers, but cannot be said to be the true cause. Whenever by ordi- 
nary gambling or by a bet, actual value is exposed to a complete or 
partial loss, this exposure is not due to the fact that the gamester 
is reckless, but because there is hope of an actual gain. " Hope," 
says Spinoza in his treatise on ethics, "is the indeterminate joy 
caused by the conception of a future state of affairs of whose 
outcome we are in doubt." Actual mathematical calculation 
cannot be attempted on the basis of this definition any more 
than it could be attempted to determine a mathematical prob- 
ability from the definition of Aristotle. "We disregard there- 
fore the psychological element of desire, which is associated with 
hope or expectation as well as the anxiousness or dread associated 
with the related psychological element of non-desire" (Cantor). 

The so-called mathematical expectation is the product of an 
expected gain in actual value and the mathematical probability 
of obtaining such a gain. The danger of loss may in this case 
be regarded as a negative gain. Thus if a person, A, may expect 
the gain, G, from the event, E, whose probability of happening 
is equal to p, then e = p-G is his mathematical expectation. 
The quantity expressed by the symbol, e, is here the amount it 
is safe to hazard for the expected gain, G. We may also regard 
the quantity, e, as a mean value or average value. Among a 
large number of n cases only np will bring the gain, G, the others 
not. Thus the total gain is : 

pnG -r- n = pG. 

Suppose we have n mutually exclusive events, Ei, Ei, ■•■, En, 
5 49 
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forming a complete disjunction. For their respective prob- 
abilities we have then the following equation : 

Pl + P2+ P3+ ■■ • + Pn = 1. 

If the actual occurrence of a certain one of these events, say, 
E^, brings a gain of (?„, then the total value of the mathematical 
expectation of the n events is: 

e = Pl-Gl + P2-G2+ ■■■ + Pn-Gn = '^Pa'G^. 

Since Spa = 1 this result may be written : 

C X (2)1 + P2 + • • • + Pn) = Gi-pi + G^-Pi + Gz-Pi + • --Gn-Pn, 

hence e may be regarded as the mean value of the different 
quantities G„ with the weights p^ {a = 1, 2, 3, ■ ■ ■ , n). 

Although we shall discuss the theory of mean values in a 
following chapter a few preliminary remarks might not be out 
of place here. 

A variable quantity X is related to a series of events Ei, E2, 
E3, • ■ ■, E„ (it being assumed that these events form a complete 
disjunction) in such a manner that when E^ happens X takes on 
the value x^ {a = 1, 2, 3, ■ ■ ■ , n). If furthermore pi, pi, pz, ■ ■ • 
denote the respective probabilities of the occurrence of Ei, E2, 
E3, • • • , then 

M{X) = piXi + P2X2 + ■ ■ -pnXn 

is called the mean value or simply the mean of X. 

The above definition may be illustrated by the following 
concrete urn-scheme. An urn contains N balls of which ai balls 
are marked Xi, 02 balls marked X2 ■ ■ ■ and finally a„ balls marked 
Xn where ai + 02 + as + ■ ■ -an = N. Each drawing from the 
urn produces a certain number X, which may assume n different 
values xi, X2, X3, ■ ■ ■ , Xn, each with the respective probabilities: 

Ui Oi an 

Vl=^>P2=^---Pn = ^. 

The arithmetic mean of all the numbers written on the balls is: 

O-lXl + 023:2 + • • • anXn 

N 
which agrees with the mean as defined above. 
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37. The Petrograd (St. Petersburg) Problem. — In this con- 
nection it is worthy to note a celebrated problem, which on 
account of its paradoxical nature has become a veritable stumb- 
ling block, and has been discussed by some of the most eminent 
writers on probabilities. The problem was first suggested by 
Daniel Bernoulli in a communication to the Petrograd — or as 
it was then called St. Petersburger Academy — in 1738. 

The Petrograd problem may shortly be stated as follows : Two 
persons A and B are interested in a game of tossing a coin under 
the following conditions. An ordinary coin is tossed until head 
turns up, which is the deciding event. If head turns up the first 
time A pays one dollar to B, if head appears first at the second 
toss B is to receive two dollars, if first at the third time four 
dollars and so on. What is the mathematical expectation of J?? 
Or in other words, how much must B pay to A before the game 
starts in order that the game may be considered fair? 

The mathematical expectation of B in the first trial is 
5X1 = 2- The mathematical expectation for head in second 
throw is {^y X 2 = ^. Or in general the mathematical prob- 
ability that head appears for the first time in the nth toss is 
(§)", and the co-ordinated expectation is 2"~^-j-2" = |. Thus the 
total expectation is expressed by the following series: 

2 + 2 + 2 + ■ ■ ■• 

\Vhen n = 00 as its limiting value it thus appears that B 
could afford to pay an infinite amount of money for his expected 
gain. 

38. Various Explanations of the Paradox. The Moral Expec- 
tation. — This evidently paradoxical result has called forth a num- 
ber of explanations of various forms by some eminent mathe- 
maticians. One of the commentators was D'Alembert. It was 
to be expected that the famous encyclopaedist, who — as we have 
seen — did not view the theory of probabilities in too kindly a 
manner, would not hesitate to attack. He returns repeatedly 
to this problem in the "Opuscules" (1761) and in "Doutes et 
questions" (Amsterdam, 1770). 

D'Alembert distinguishes between two forms of possibilities, 
viz., metaphysical and physical possibilities. An event is by 
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him called a metaphysical possibility, when it is not absurd. 
When the event is not too "uncommon" in the ordinary course 
of happenings it is a physical possibility. That head would 
appear for the first time after 1,000 throws is metaphysically 
possible but quite impossible physically. This contention is 
rather bold. "What would," as Czuber remarks, "D'Alembert 
have said to an actual reported case in 'Grunert's Archiv' where 
in a game of whist each of the four players held 13 cards of one 
suit." The numerical probability of such an event as expressed 
by mathematical probabilities is (635013559600)^. 

D'Alembert's definitions including the half metaphorical term 
"ordinary course" are rather vague. And what numerical 
value of the mathematical probability constitutes the physical 
impossibility? D'Alembert gives three arbitrary solutions for 
the probability of getting head in the nth throw, namely : 

1 1 1 



2"(1 + jSn") ' 2"+"" ' 2"B 



where a, /3, B, K are constants and q an uneven number. 

Daniel Bernoulli himself gives a solution wherein he introduces 
the term "moral expectation." If a person possesses a sum of 
money equal to x then according to Bernoulli 

, kdx 

is the moral expectation of x, k being a constant quantity. 
Integrating. we get: 

Jdy = k j — = k(log b — log a) = k log-, 

which is the moral expectation of an increase h — a of an original 
value a. If now x denotes the sum owned by B we may replace 
the mathematical expectation by their corresponding moral ex- 
pectations, that is to say replace 2"-V2" by (1/2") log ((a+2"~i)/x) 
and we then have: 

^/'ll ^+l_i.ii ^+2 1 x + 2- \ 
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which is a convergent series. In this connection, it may be 
mentioned that the Bernoullian hypothesis has found quite an 
extensive use in the modern theory of utility. 

De Morgan in his splendid httle treatise "On Probabihties" 
takes the view that the solution as first given is by no means an 
anomaly. He quotes an actual experiment in coin tossing by 
Buffon. Out of 2,048 trials 1,061 gave head at the first toss, 
494 at the second, 232 at the third, 137 at the fourth, 56 at the 
fifth, 29 at the sixth, 25 at the seventh, 8 at the eighth and 6 at 
the ninth. Computing the various mathematical expectations, 
we find that the maximum value is found in the 25 sets with head 
in the seventh toss, which gives a gain of 25 X 64 = 1,600. The 
most rare occurrence, the 6 sets of head in the ninth throw gives 
a gain of 6 X 25G = 1,536, which is the next highest gain in all 
the nine sets. De INIorgan furthermore contends that if Buffon 
had tried a thousand times as many games, the results would 
not only have given more, but more per game, arguing "that a 
larger net would have caught not only more fish but more varieties 
of fish; and in two millions of sets, we might have seen cases in 
which head did not appear till the twentieth throw." Further- 
more, "the player might continue until he had realized not only 
any given sum, but any given sum per game." Therefore 
according to De Morgan the mathematical expectation of a 
player in a single game must be infinite. 



CHAPTER VI. 

PROBABILITY A POSTERIORI. 

39. Bayes's Rule. A Posteriori Probabilities. — The problems 
hitherto considered have all had certain points in common. 
Before entering upon the calculations of the mathematical 
probability of the happening of the event in question, we knew 
beforehand a certain complex of causes which operated in the 
general domain of action. We also were able to separate this 
general complex of productive causes into two distinctive minor 
domains of complexes, of which one would bring forth the event, 
E, while the other domain would act towards the production of 
the opposite event, E. Furthermore we also were able to 
measure the respective quantitative magnitudes of the two 
domains, and then, by a simple algebraic operation, determine 
the probability as a proper fraction. The addition and multi- 
plication theorems did not introduce any new principles, but 
only gave us a set of systematic rules which facilitated and 
shortened the calculations of the relations between the different 
absolute probabilities. The above method of determination 
of a mathematical probability is known as an a priori determina- 
tion, and such probabilities are termed a priori probabilities. 

The problems treated in the preceding chapters have, nearly 
all, been related to different games of chance, or purely abstract 
mathematical quantities. The inorganic nature of this kind of 
problems has made it possible for us to treat them in a relatively 
simple manner. In many of the problems, which we shall con- 
sider hereafter, organic elements enter as a dominant factor and 
make the analysis much more complicated and difficult. 

All social and biological investigations, which are of a much 
larger benefit and practical value than the problems in games of 
chance, lead often to a completely different category of probabil- 
ity problems, which are known as " a posteriori probabilities." 
In problems where organic life enters into the calculations, the 
complex of productive causes is so varied and manifold, that 

54 
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our minds are not able to pigeonhole the different productive 
causes, placing them in their proper domains of action. But we 
know that such causes do exist and are the origin of the event. 
If now, by a series of observations, we have noticed the actual 
occurrence of the event, E (or the occurrence of the opposite 
event E), the problem of the determination of an a posteriori 
probability to find the probability that the event E originated 
from a certain complex, say F. We must then, first of all, 
form a complete hypothetical judgment of the form: E either 
happens from the complexes Fi, or f 2, or F^, • • • or f „. But we 
must not forget that, in general, the different complexes F^ 
(a = 1, 2, • • •, n) of the disjunction are not known a priori. 
We must, therefore, determine the respective probabilities for 
the actual existence of such disjunctive complexes F^. These 
probabilities of existence for the complexes of causes are in 
general different for each member, a fact which often has been 
overlooked by many investigators and writers on a posteriori 
probabilities, and which has given rise to meaningless and 
paradoxical results. 

40. Discovery .and History of the Rule. — The first discoverer 
of the rule for the computation of a posteriori probabilities by 
a purely deductive process was the English clergyman, T. Bayes. 
Bayes's treatise was first published after the death of the author 
by his friend. Dr. Price, in Philosophical Transactions for 1763. 
The treatise by the English clergyman was, for a long time, 
almost forgotten, even by the author's own countrymen; and 
later English writers have lost sight of the true " Bayes's Rule " 
and substituted a false, or to be more accurate, a special case of 
the exact rule, in the different algebraic texts, under the discus- 
sion of the so called " inverse probabihties," a name which is due 
probably to de Morgan, and which in itself is a great misnomer. 
This point, presently, we shall discuss in detail. 

The careless application of the exact rule has recently led to 
a certain distrust of the whole theory of " a posteriori proba- 
bilities." Scandinavian mathematicians were probably the first 
to criticize the theory. In 1879, Mr. J. Bing, a Danish actuary, 
took a very critical attitude towards the mathematical principles 
underlying Bayes's Rule, in a scholarly article in the mathe- 
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matical journal Tidsskrift for Matematik. Bing's article caused 
a sharp, and often heated, discussion among the older and younger 
Danish mathematicians at that time; but his views seem to have 
gained the upper hand, and even so great an authority on the 
whole subject as the late Dr. T. Thiele, in his well-known work, 
" Theory of Observations " (London, 1903), refers to Bing's 
article as "a crushing proof of the fallacies underlying the 
determination of a posteriori probabilities by a purely deductive 
method." As recently as 1908, the Danish writer on philosophy. 
Dr. Kroman, has taken up cudgels in defense of Bayes in a 
contribution in the Transactions of the Royal Danish Academy 
of Science, which has done much towards the removal of many 
obscure and erroneous views of the older authors. Among 
English writers, Professor Chrystal, in a lecture delivered before 
the Actuarial Society of Edinburgh, has also given a sharp 
criticism of the rule, although he does not go so deeply into the 
real nature of the problem as either Bing or Kroman. 

Despite Chrystal's advice to " bury the laws of inverse prob- 
abilities decently out of sight, and not embalm them in text books 
and examination papers " the old view still holds sway in recent 
professional examination papers. It is therefore absolutely 
necessary for the student preparing for professional examinations 
to be acquainted with the theory. In the following paragraphs 
we shall, therefore, give the mathematical theory of Bayes's 
Rule with several examples illustrating its application to actual 
problems, together with a criticism of the rule. 

41. Bayes's Rule (Case I). — {The different complexes of causes 
producing the observed event, E, possess different a priori proba- 
bilities of existence.) Let E denote a certain state or condition, 
which can appear under only one of the mutually exclusive 
complexes of causes: Fi, Fi, ■ ■ ■ and not otherwise. Let the 
probability for the actual existence of Fi be k\ and if Fi really 
exists then let wi be the " productive probability " for bringing 
forth the observed event, E {E being of a different nature from 
F), which can only occur after the previous existence of one of 
the mutually exclusive complexes, F. Let, in the same manner, 
F2 have an " existence probability " of ko and a " productive 
probability " of C02, Fs an existence probability of K3 and a pro- 
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ductive probability of C03 • • • etc. If now, by actual observa- 
tion, we have noted that the event E has occurred exactly m 
times in n trials, then the probability that the complex i^i was 
the origin of £ is: 

^^-2«..co.'"(l-coJ"-'» («= 1.2,3, •••). 
Similarly that complex F2 was the origin: 

K2 ■ COz^Cl — C02)"~" 



Q2 = 



2/c„ • co.-Cl - wj" 



and so on for the other complexes. 

Proof. — Let the number of equally possible cases in the general 
domain of action, which leads to one of the complexes F^, be t. 
Fiu-thermore, of these t cases let /i be favorable for the existence 
of complex f i,/2 for F2,fz for F3, • • • , etc. Then the probabilities 
for the existence of the different complexes F^{a = 1,2,3, ■ ■ ■ n) 
are: 

/i fi h .. , 

Ki = — , K2 = -7 , Kz = -r • • • respectively. 

Of the /i favorable cases for complex Fi, Xi are also favorable for 

the occurrence of E. 
Of the fi favorable cases for complex F2, X2 are also favorable for 

the occmrence of E. 
Of the fi favorable cases for complex Fz, X3 are also favorable for 

the occurrence of E. 

The probability of the happening of E under the assumption that 
F-i exists, i. e., the relative probability: Pj,-^{E), is: 





X, 


COi 


/x 




x„ 


w„ 


fa 



or in general: 

(a = 1, 2, 3, • • •). 

The total number of equally likely cases for the simultaneous 
occurrence of the event E with either one of the favorable cases 
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foTFi,F,,F3, •••Is: 

Xi + X2 + X3 + • • • = SX.. 

The number of favorable cases for the simultaneous occurrence 
of Fi and E is Xi, for the simultaneous occurrence of F2 and E, 
X2, • • • , etc. Hence: we have as measures for their corresponding 
probabilities 





^ Xi _ X2 






'^'-■EK' ^'~EK' 




But 








Xl = COl fi, X2 = CO2 • /2, 


■■■, etc.. 


and 








/l = Kl • t, f2 = K2 ■ t, ■ 


• • , etc. 


Hence 







\l = COl ■ Kl • t, X2 = 0)2 • K2 ■ i, ■ • ■ , etc. 

Substituting these values in the above expression for Qi, Q2, 
we get: 

^ Kl • Wl /C2 • CO2 



as the respective probabilities that the observed event originated 
from the complexes Fi, F2, F3, ■ ■ ■ . Such probabilities are called 
a posteriori probabilities. 

Let us now for a moment investigate the above expression for 
Qi, Q2, ■ ■ ■■ The numerator in the expression for Qi is ki • wl 
But Kl is simply the a priori probability for the existence of Fi 
while COl is the a priori productive probability of bringing forth 
the event observed from complex F^. The product ki ■ coi is 
simply the relative probability Pp^{E), or the probability that 
the event E originated from Fi. In the denominator we have 
the expression Sk„w„ (a = 1, 2, ■ ■ ■ n) which is the total proba- 
bility to get E from any of the complexes F^. From example 17 
(Chapter W) we know that the probability to get E exactly m 
times from Fi in n total trials is: 

Vi= (^)'Ci-a;r(l-coi)»-™ 

and the probability to get E from any one of the complexes, F, 
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m times out of w is: 

2P« = {D ^K^ ■ o>:-{X - coj"- (a = 1, 2, 3, ■ • ■). 

If, by actual observation, we know the event E to have happened 
exactly m times out of n, then the a posteriori probability that 
Fi was the origin is: 

( "^ ) Ki ■ cor(l - co,)"-^ 
ei=-77^T (a= 1,2,3, ■••). (I) 



rials (™)i 



The factorials I j in numerator and denominator cancel each 

other of course. It will be noticed that, in the above proof, it 
is not assumed that the a posteriori probability is proportional 
to the a priori probability, an assumption usually made in the 
ordinary texts on algebra. 

42. Bayes's Rule (Case 11). — (Special Case. The a priori 
probabilities of existence of the different complexes are equal.) 
Sometimes the different complexes F may be of such special 
characters thai their a priori probabilities of existence are equal, 
i. e., 

Kl = Ki = Ki = K4: ■ ■ ■ Kn- 

In this case the equation (I) simply reduces to: 

_ cord -coO — 

Equation (I) gives, however, the most general expression for 
Bayes's Rule which may be stated as follows: 

If a definite observed event, E, can originate from a certain series 
of mutually exclusive complexes, F, atid if the actual occurrence of 
the event has been observed, then the probability that it originated 
from a specified complex or a specified group of complexes is also 
the " a posteriori " probability or probability of existence of the 
specified complex or group of complexes. 

43. Determination of the Probabilities of Future Events 
Based Upon Actual Observations. — It happens frequently that 
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our knowledge of the general domain of action is so incomplete, 
that we are not able to determine, a priori, the probability of the 
occurrence of a certain expected event. As we already have 
stated in the introduction to a posteriori probabilities, this is 
nearly always the case with problems wherein organic life enters 
as a determining factor or momentum. But the same state of 
affairs may also occur in the category of problems relating to games 
of chance, which we have hitherto considered. Suppose we had 
an urn which was known to contain white and black balls only, 
but the actual ratio in which the balls of the two different colors 
were mixed, was unknown. With this knowledge beforehand, 
we should not be able to determine the probability for the drawing 
of a white ball. If, on the other hand, we knew, from actual 
experience by repeated observations, the results of former draw- 
ings from the same urn when the conditions in the general domain 
of action remained unchanged during each separate drawing, then 
these results might be used in the determination of the prob- 
ability of a specified event by future drawings. 

Our problem may be stated in its most general form as follows : 
Let F^ denote a certain state or condition in the general domain 
of action, which state or condition can appear only in one or the 
other of the mutually exclusive forms : Fi, F2, F.3, • • ■ . and not 
otherwise. Let the probability of existence of Fi, F2, F^, • • • be 
Ki, K2, K3, ■ ■ ■ respectively, and when one of the complexes Fi, F2, 
Fz, ■ ■ ■ exists (occurs) let wi, W2, ws, • • • be the respective pro- 
ductive probabilities of bringing forth a specified event, E. 
If now, by actual observation, we know the event, E, to have 
happened exactly m times out of n total trials (the conditions in 
the general domain of action being the same at each individual 
trial), what is then the probability that the event, E, will happen 
in the (n + l)th trial also? 

By Bayes's Rule we determined the " a posteriori " probabili- 
ties or the probabilities of existence of the complexes Fi, F2, ■ ■ ■ 
as: 

(a= 1,2,3, ■••). 
In the (n + l)th trial E may happen from any one of the mutually 
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exclusive complexes: Fi, F2, F3, • ■ • whose respective probabilities 
in producing the event, E, are wi, W2, W3, • • • . The addition 
theorem then gives us as the total probability of the occurrence 
of E in the (n + l)th trial: 

Ra = '^PfJ.E) = Ql • Wl + Q2 • CO2 + Qi • 033 

^ Sk, ■ co^^-Cl - CO.)"— • CO, . _ ^ „ „ . (Ill) 

"^■■' S/c, • co.^a - coj"-- l«- i.A'i, ••■;• 

If the a priori probabilities of existence are of equal magnitude 
(Case II) the factors k in the above expression cancel each other 
in numerator and denominator and we have 

^ Sco.-'d - coJ"-"co. 

44. Examples on the Application of Bayes's Rule. — Example 
21. — An urn contains two balls, white or black or both kinds. 
What is the probability of getting a white ball in the first draw- 
ing, and if this event has happened and the ball replaced, what 
is then the probability to get white in the following drawing? 

Three conditions are here possible in the urn. There may be 0, 
1, or 2 white balls. Each hypothetical condition has a proba- 
bility of existence equal to \, and the productive probabilities 
for white are 0, | and 1 respectively. The total probability to 
get white is therefore: 

If we now draw a white ball then the probabilities that it came 
from the complexes: F-^, F^, F3, respectively, are: 

n_^i i_^l 1^1 

U • 2) 6 • 2> 3 • 2- 

These are also new existence probabilities of the three proba- 
bilities. The probability for white in second drawing is therefore . 

(0-i)0+(iH-i)|+(i^i)l = f. 

This solution of the problem is, however, not a unique solution, 
because it is an arbitrary solution. It is arbitrary in this respect, 
that we have without further consideration given all three com- 
plexes the same probability of existence, f . We shall discuss 
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this part of the question under the chapter on the criticism of 
Bayes's Rule. 

Example 22. — An urn contains five balls of which a part is 
known to be white and the rest black. A ball is drawn four 
times in succession and replaced after each drawing. By three 
of such drawings a white ball was obtained and by one drawing 
a black ball. What is the probability that we will get a white 
ball in the fifth drawing? 

In regard to the contents of the urn the following four hypoth- 
eses are possible: 

F\: 4 white, 1 black balls, 



^2: 


3 


(( 


2 


Fz: 


2 


(( 


3 


F,: 


1 


a 


4 



Since we do not know anything about the ratio of distribution 
of the different colored balls, we may by a direct application of 
the principle of insufficient reason regard the four complexes as 
equally probable, or: 



K\= Kl= Kz = Ki= J. 

If either Fi, Fi, F3 or f 4 exists, the respective productive 
probabilities are: 

^,_4 .,_3 ,,_2 ,,_! 

By a direct substitution in the formula: 

Sco^^Cl - CO.)"-™ • w„ 



R 



Sco.™(l - CO J"-™ 
(a = 1, 2, 3, 4) f or w = 4 and m = 3 we get: 

„ anm) + (f)''(i)(f) + (f)^(i)(f) + (mm) ,, 
(i)'(i) + (f )^(f) + (i)^(f) + {\m) ' ^^- 

45. Criticism of Bayes's Rule. — In most English treatises on 
the theory of chance the " a posteriori " determination of a 
mathematical probability is discussed under the socalled " in- 
verse probabilities." This somewhat misleading name was prob- 
ably first introduced by the eminent English mathematician and 
actuary, Augustus de Morgan. In the opening of the discussion 
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of a posteriori probabilities in the third chapter of his treatise, 
" An Essay on Probabilities," de Morgan says: " In the preceding 
chapter, we have calculated the chances of an event, knowing the 
circumstances under which it is to happen or fail. We are now 
to place ourselves in an inverted position, we know the event, 
and ask what is the probability which results from the event 
in favor of any set of circumstances under which the same might 
have happened." Is this now an inverse process? By the a 
priori or — as de Morgan prefers to call them, — the direct prob- 
abilities, we started from a definitely known condition and de- 
termined the probability for a future event, E, or what is the same, 
the probability of a specified future state of affairs. Here we 
start knowing the present condition and try to determine a past 
condition. The process apparently appears to be the inverse of 
the former, although they both are the same. We possess a 
definite knowledge of a certain condition and try to determine 
the probability of the existence of a specified state of affairs, in 
general different from the first condition, but whether this state 
of affairs occurred in the past or is to occur in the future has no 
bearing on our problem. In other words, time does not enter 
as a determining" factor. And even if we were willing to admit 
the two processes of the determination of the different probabil- 
ities to be inverse, the probabilities themselves can not be said 
to be inverse. Nevertheless, this misleading name appears over 
and over again in examination papers in England and in America 
as a thoroughly embalmed corpse which ought to have been 
buried long ago. What is really needed, is a change of customary 
nomenclature in the whole theory of probability. Instead of 
direct and inverse, a priori and a posteriori probabilities, it would 
be more proper to speak about " prospective " and " retro- 
spective " probabilities in the application of Bayes's Rule. All 
probabilities are in reahty determined bj- an empirical process. 
That there is a certain probability to throw a six with a die we 
only know after we have formed a definite conception of a die. 
The only probabilities which we perhaps rightly may name a 
priori are the arbitrary probabilities in purely mathematical 
problems where we assume an ideal state of affairs. " There 
is," to quote the Danish writer on logic. Dr. Kroman, " really 
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more reason to doubt the a priori than the a posteriori probabil- 
ities, and it would be more natural and also more exact in the 
application of Bayes's Rule to speak about the actual or original 
and the new or gained probability." 

The discussion above has really no direct bearing on Bayes's 
Rule but was introduced in order to give the student a clearer 
understanding of the main principles underlying the whole deter- 
mination of a posteriori probabilities by means of actual experi- 
mental observations, and also to remove some obscure points. 
From his ordinary mathematical training every student of mathe- 
matics has an almost intuitive understanding of an inverse process. 
Naturally when he encounters again and again the customary 
heading: "inverse probabilities " in text-books he obtains from 
the very start — almost before he starts to read this particular 
chapter — an inverse idea of the subject instead of the idea he really 
ought to have. Nowhere in continental texts on the theory of 
probabilities, will the reader be able to fiild the words direct and 
inverse applied in the same sense as in English texts since the 
introduction of these terms by de Morgan. We shall advise 
readers who have become accustomed to the old terms to pay 
no serious attention to them. 

46. Theory Versus Practice. — In § 41 we reduced Bayes's 
Rule to its most general form: 

This is an exact expression for the rule, but it is at the same 
time almost impossible to employ it in practice. Only in a few 
exceptional cases do we know, a priori, the different values of the 
often numerous probabilities of existence /c„, of the complexes 
F^, and in order to apply the rule with exact results we require 
here sufScient facts and information about the different com- 
plexes of causes from which the observed event, E, originated. 
Bayes deduced the rule from special examples resulting from 
drawings of balls of different colors from an urn where the different 
complexes of causes were materially existent. The probability 
of a cause or a certain complex of causes did not here mean the 
probability of existence of such a complex but the probability 
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that the observed event originated from this particular complex. 
In order to elucidate this statement we give following simple 
example: 

Example 23. — A bag contains 4 coins, of which one is coined 
with two heads, the other three having both head and tail. A 
coin is drawn at random and tossed four times in succession and 
each time head turns up. What is the probabihty that it was 
the coin with two heads? 

The two complexes Fi and Fi, which may produce the event, 
E, are: Fi, the coin with two heads, and Fi, an ordinary coin. 
The probability of existence of Fi is the probability of drawing 
the single coin with two heads which is equal to \, the probability 
of existence for the other complex, F2, is equal to |. The 
respective productive probabilities are 1 and \. Thus ki = \, 
/C2 = 4, wi = 1 and W2 = 2- Substituting these values in formula 
(I) (m = 4, m = 4), we get: 

Q = (i X 1^) -^ (J X 1^ + ! X {\Y) = i - Jl = H- 

But in most cases we do not know anything about the material 
existence of the complexes of causes from which the event, E, 
originated. On the contrary, we are forced to form a hypothesis 
about their actual existence. To start with a simple case we take 
example 21 of § 44. 

We assumed here three equally possible conditions in the urn 
before the drawings, namely the presence of 0, 1, or 2 white balls. 
From this assumption we found the probability to get a white 
ball in the second drawing, after we had previously drawn a white 
ball and then put it back in the urn before the second drawing, 
to be equal to |. As we already remarked, this solution is not 
unique because it is an arbitrary solution. It is arbitrary to 
assign, without any consideration whatsoever, | as the probability 
of existence to each of the three conditions. Let us suppose 
that each of the two balls bore the numbers 1 and 2 respectively. 
We may then form the following equally likely conditions: 

6162, hiW2, biWi, wiiVi, 

each condition having an a priori probability of existence equal 
to \ and a productive probability for the drawing of a white 
6 
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ball equal to: 0, §, | and 1 respectively. Thus: 
and 



Kl = Ki — Ki = Ki = I 



Wl =0, W2 = I, '•'3 = h W4 = 1- 

The respective a posteriori probabilities, that is the new or 
gained probabilities of the four hypothetical conditions, become 
now by the application of Bayes's Rule (Formula II) : 

Qi — 2' Q2 = 2 "=" 2, Q3 = 2 "=" 2, Qi = 2- 
Hence the probability for white in the second drawing is: 

( Formula lY: R = - ^ lti \„ T ) 

E = ^ 2 + (I -^ 2) + (i ^ 2) + (1 - 2) = i 

In the first solution we got | for the same probability. Which 
answer is now the true one? Neither one! The true answer to 
the problem is that it is not given in such a form that the last 
question — the probability of getting a white ball in the second 
drawing — may be settled without any doubt. The answer must 
be conditional. Following the first hypothesis we got |, while 
the second hypothesis gives f as the answer. 

We next proceed to example 22 which is almost identical in 
form to the first one, the only difference being a greater variety 
of hypothetical conditions. We started here with the following 
four hypotheses: 

Fi: 4 white, 1 black ball, F2: 3 white, 2 black, F3: 2 white, 3 
black and ^^4: 1 white and 4 black balls, assigning J as the hy- 
pothetical existence probability. 

By marking the 5 balls similarly as in the last example, with 
the numbers from 1 to 5 we may form the complexes: 

Fi: 4 white and 1 black ball in (5) ways, 
^2: 3 " "2 " balls " (i) " 
F3: 2 " "3 " " " (I) " 
^4=1 " " 4 " " " (6) " 

This gives us a total of 5 + 10 + 10 + 5 = 30 different 
complexes. Assuming all of these complexes equally likely 
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to occur, we get following probabilities of existence and pro- 
ductive probabilities: 

Kl = K2 = K3 = K4 = • • • = K30 = ^V 

wi = &;2 = W3 = C04 = C05 = f (Productive prob. for Fi) 
coe = wy = '^s = • • • = W16 = f (Productive prob. for F2) 
W16 = CO17 = • • • = £025 = f (Productive prob. for F3) 
W26 = W27 = W28 = W29 = W30 = 3- (Productive prob. for Fi). 

The total probability of getting a white ball in the second 

. . Sw^l - CO J CO, ^ 1 o "} •3n^ 

drawing is now -r^; — 57- r — {a = I, 2, S, • ■ ■ , 60). 

2/C0„ (,1— coj 

Actual substitution of the above values of co in this formula 
gives us the final result as: R = ^^. 

47. Probabilities Expressed by Integrals. — By making an ex- 
tended use of the infinitesimal calculus Mr. Bing and Dr. Kroman 
in their memoirs arrived at much more ambiguous results through 
an application of the rule of Bayes. Starting with the funda- 
mental rule as given in equation (I) in § 41, we may at times en- 
counter somewhat simpler conditions inside the domain of 
causes. The total complex of actions may embrace a large 
number of smaller sub-complexes construed in such a way that 
the change from one complex to another may be regarded as a 
continuous process, so that the productive probabilities are 
increased by an infinitely small quantity from a certain lower 
limit, a, to an upper limit, b. Denoting such continuously in- 
creasing probabilities by v and the corresponding small proba- 
bilities of existence by udv, we have as the total probability of 
obtaining E from any one of the minor complexes with a pro- 
ductive probability between a and {a ^ a, ^ ^ b) 

p = I uvdv. 

The probability that when £ has happened it originated from 
one of those minor complexes, or the probability of existence of 
some one of those complexes is: 



r 
i: 



uvdv 
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The situation may be still more simplified by the following con- 
siderations. In the continuous total complex between the limits 
a and h we have altogether situated (6 — a) jdv individual minor 
complexes, If we assume all of these complexes to possess the 
same probability of existence, we must have: 



udv = 



dv 



b — a 
The two formulas then take on the form: 



P 



and 



= -- r 

p — '^ 

r 



vdv 



vdv 



A still more specialized form is obtained by letting a = and 
b = 1 which gives: 



-r 



vdv and P = 



f 



vdv 



i: 



vdv 



The above formulas may perhaps be made more intelligible 
to the reader by a geometrical illustration. 



^ 





Let the various productive probabilities, v, be plotted along the 
A' axis in a Cartesian coordinate system in the interval from a 
to b (a < b). To any one of these probabilities say Vr there 
corresponds a certain probability of existence, m,, represented 
by a Y ordinate. In the same manner the next following pro- 
ductive probability, ?)r+i, will have a probability of existence 
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represented by an ordinate Wr+i. It is now possible to represent 
the various u's by means of areas instead of line ordinates. Thus 
the probability of existence, Ur, is in the figure represented by 
the small shaded rectangle, with a base equal to 

?)r+l — Vr = A2)r, 

and an altitude of Ur, the total area being equal to AvrUr. That 
this is so, follows from the well-known elementary theorem from 
geometry that areas of rectangles with equal bases are directly 
proportional to their altitudes. The sum of the different it's is 
thus in the figure represented as the sum areas of the various 
small rectangles in the staircase shaped histograph. Now ac- 
cording to our assumption « is a continuous function in the interval 
from a to b. We may, therefore, divide this interval, b — a, 
into n smaller equal intervals. Let 

b — a 

«r+l — Vr = AUr = 



be one of these smaller divisions. By choosing n sufficiently 
large, (b — a)/n or Av becomes a very small quantity and by 
letting n approach infinity as a limiting value we have 

T b — a 

lim u = Iim uAv = udv. 

In this case the histograph is replaced by a continuous curve and 
udv is the probability of existence that the productive probability 
is enclosed between v and v -{- dv} 

The probability to get E from any one of the complexes Is 
evidently given by the total area of the small rectangles, or in 
the continuous case by means of the integral: 






uvdv . 



^ A more rigorous analysis would be as follows : We plot along the abscissa 
axis intervals of the length e so that the middle of the interval has a distance 
from the origin equal to an integral multiple of «. If now e is chosen suffi- 
ciently small, we may regard the probability of existence of u, for values of 
the variable v between re — |e and re + Je as a constant and the probability 
that V falls between the limits re — Je and re + Je may hence be expressed as 
eUr. When e approaches as a limiting value this expression becomes vdv. 
See the similar discussion under frequency curves. 
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In the same way the probabiUty that E originated from any 

of the complexes between a and ^ is: 

••3 
uvdv 



I 



f 

*J a 



uvdv 



The special case a = and & = 1 needs no further commentary. 
We are now in a position to consider the examples of Bing and 
Kroman. Any student familiar with multiple integration will 
find no difficulty in the following analysis. For the benefit of 
readers to whom the evaluation of the various integrals may seem 
somewhat difficult, we may refer to the addenda at the close of 
this treatise or to any standard treatise on the calculus as, for 
instance, WiUiamson's " Integral Calculus." 

48. Example 24. — An urn contains a very large number of 
similarly shaped balls. In 10 successive drawings (with replace- 
ments) we have obtained 7 with the number 1, 2 with the number 
2, and one having the number 3. What is the probability to 
obtain a ball with another number in the following drawing? 

We must here distinguish between 4 kinds of balls, namely 
balls marked 1, 2, 3, or " other balls." A general scheme of 
distribution of the balls in the urn may be given through the 
following scheme: 

nx balls marked with the number 1, 
ny " " " " " 2, 

nz " " " " " 3 and 

nt = n{\ — X — y — z) other balls. 

Here x, y, z and t represent the respective productive probabil- 
ities. If we now let all such probabilities assume all possible 
values between and 1 with intervals of l/n, we obtain the pos- 
sible conditions in the total complex of actions. Each of these 
conditions has a probability of existence, *, and the productive 
probabilities x, y, z, and \ — x — y — z. The original probability 
for 7 ones, 2 twos and 1 three in 10 drawings is: 

10' 



48] EXAMPLE 24. 71 

Now when m is a very large number the interval 1/n becomes a 
very small quantity, and we may approximately write: 

s = udxdydz, 

and also write the above sum as a triple integral: 

10! C^ f^ n 

where 

y = 1 — X and q = 1 — x — y. 

If now the above event has happened, then the probability to get 
a different marked ball in the 11th drawing is: 

ml 
u ■ x' • y^ • z{l — X — y — z) ■ dx ■ dy ■ dz 



Q = 



ml 
u • x' • y'^ ■ z ■ dx • d,y ■ dz 



It is, however, quite impossible to evaluate the above integral 
without knowing the form of the function u; but unfortunately 
our information at hand tells us absolutely nothing in regard to 
this. Perhaps the balls bear the numbers 1, 2 and 3 only, or 
perhaps there is an equal distribution up to 10,000 or any other 
number. Our information is really so insufficient that it is quite 
hopeless to attempt a calculation of the a posteriori probability. 

Many adherents of the inverse probability method venture, 
however, boldly forth with the following solution based upon the 
perfectly arbitrary hypothesis that all the m's are of equal magni- 
tude. This gives the special integral: 

mi 
x' ■ y'^ ■ z{\ — X — y — z), dx • dy • dz 
_ _ 

V /-•! /^P /»« 

I I I x'' • y^ ■ z ■ dx ■ dy ■ dz 

Jo Jo Jo 

where once more it must be remembered that 

x + y + z^l. 

In this case the limits of x are and 1, those of y are and 1 — x 
and those of z are and 1 — x — y. 

This is a well-known form of the triple integral which may be 
evaluated by means of Dirichlet's Theorem: 
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1 a;'^V~^2""W.r ■ dy ■ dz = 

Jo 



-'-"-^ V{b)T{m)V{n) 



T{l-[-h+m-\-n) 

(See Williamson's Calculus.) 

Remembering the well-known relation between gamma func- 
tions and factorials, viz. V{n-\- 1) = n\, we find by a mere 
substitution in the integral, the value of the probability in 
question to be 1:14. Another and equally plausible result is 
obtained by a slightly different wording of the problem. 

Ten successive drawings have resulted in balls marked 1, 2, 
or 3. What is the probability to obtain a ball not bearing such 
a number in the 11th drawing? This probability is given by 
the formula. 






t)i°(l - v)dv 



f v^^dv 
Jo 



= 1 : 12. 



Quite a different result from the one given above. 

49. Example 25 — Bing's Paradox. — A still more astonishing 
paradox is produced by Bing when he gives an example of Bayes's 
Rule to a problem from mortality statistics. A mortality table 
gives the ratio of the number of persons living during a certain 
period, to the number living at the beginning of this period, 
all persons being of the same age. By recording the deaths 
during the specified period (say one year) it has been ascertained 
that of s persons, say forty years of age at the beginning of the 
period, m have died during the period. The observed ratio is 
then (s — m)/s. If * is a very large number this ratio may (as 
we shall have occasion to prove at a later stage) be taken as an 
approximation of the true ratio of probability of survival during 
the period. If s is not sufficiently large the believers in the inverse 
theory ought to be able to evaluate this ratio by an application 
of Bayes's Rule, by means of an analysis similar to the one as 
follows : 

Let y be the general symbol for the probability of a forty- 
year-old person being alive one year from hence. Each of such 
persons will in general be subject to different conditions, and the 
general symbol, y, will therefore have to be understood as the 



49] EXAMPLE 25. bing's paradox. 73 

symbol for all the possible productive probability values changing 
from to 1 by a continuous process. 

Assuming s a very large number each condition will have a 
probability of existence equal to udy. We may now ask: What 
is the probability that the rate of survival of a group of s persons 
aged 40 is situated between the limits a and ;8? 

The answer according to Bayes's Rule is: 



f 



£ 



^{l - y)-^udy 

1 (I) 

f-^{l - y)"'udy 
Jo 

Let us furthermore divide the whole year into two equal parts 
and let t/i be the probability of surviving the first half year, 
2/2 the probability of surviving the second half, and Wi • dyi, 
M2 • dyi the corresponding probabilities of existence. Then the 
respective a posteriori probabilities for y^ and 2/2 are: 

yr^'{l - 7/1)™ Mirfj/i 



X 



JO 

and 



1 
2/i^-"Hl - 2/i)™Mi^2/i 

i.r^{l - y2)"^-U2dy2 



(mi + m2 = m) 



f 2/2^ (1 - y^T'-Uidyi 
'Jo 

(mi and mi represent the number of deaths in the respective half 
years.) The probability that both yi and 2/2 are true is then 
according to the multiplication theorem: 

2/1""' (1 - yi)""Mi%it/2'~"(l - y%)""'Uidy2 

I 2/i»-^'(l - yi)"''uidyi I 2/2*^(1 - y2)"''U2dy2 
i/o •-'0 

where y = yi ■ y2. 

The probability that the probability of survival for a full 
year, y, is situated between the limits a and /3 is therefore : 



r ] yr^'C^ - yiT'yr^{l - 2/2)'"Hti • u^ ■ dy^ ■ dy^ 

'^. w. (11) 

I yr^\l - yiT'uidyi 2/2'""(l - y2)'"'u2dy 
Jo Jo 



74 PEOBABILITY A POSTERIOKI. [49 

where the Umits in the double integral in the numerator are de- 
termined by the relation: 

a ^ z/12/2 = ^■ 

Choosing the principle of insufficient reason as the basis of 
our calculations, merely assuming that all possible events are, in 
the absence of any grounds for inference, equally likely, the 
various quantities expressed by the general symbol, u, become 
equal and constant and cancel each other in numerator and 
denominator, which brings the a posteriori probabilities ex- 
pressed by (I) and (II) to the forms: 



X 






. (HI) 



2/^^(1 - yYdy 
and 

2/i'-""(l - 2/i)™'2//-'"'-'^(l - ?/2)'^(^2/i • dyi 



n> 



f ^//-'"'(l - 2/1)"' / y2^{l - y2)'^dyi ■ dy2 
•Jo •Jo 



(IV) 



where the limits in the numerator in the latter expression are 
determined by the relation : a < yiy^ < (3. 
Letting 

y 

V2 = — 

^ 2/1 
and then 

I - yi = z{\ - y) 

this latter expression may after a simple substitution be brought 
to the form: 



X(3 r z"* 

2A-"(1 -y)^^dyj - 



'2"*^! - 2)"'<^2 



'^'-'^ (V) 



Jj/i-'-Kl - yiT'dy^j 2/2'-^(l - 2/2)'^(^2/2 

(See appendix.) 

Mr. Bing now puts the further question : What is the probability 
that a new person forty years of age, entering the original large 



49] EXAMPLE 25. 



(O 



group of s persons, will sur\-ive one year, when we assume 
OTi = 7K2 = 0? (Ill) gives the answer: 






« + 2 ' 

y'ay 



Formula (V), on the other hand, gives us: 

dz 



I 2^^1(1 - y)dy j - , , i^2 

Jo Jo 1 — -d — y) ^ ( s+ l y 

~ \s+2} ■ 



I yidyi I 
Jo Jo 



y^dyi 



As the above analysis is perfectly general, we might equally 
well have applied it to each of the semi-annual periods, which 
would give us an a posteriori probability of survival equal to 

I .-, I for each half year, or a compound probability of 

_i_ .^ J for the whole year. Extending this process it is 

easily seen that by di'v'iding the year into parts, we shall have 

(* -I- 1 \ " 
~^_^ I as the final probability a posteriori that a forty-year- 
old person will reach the age of forty-one. By letting n increase 
indefinitely the above quantity approaches as its limiting 
value and we obtain thus the paradox of Bing : 

//, among a large group of s equally old persons, we have observed 
no deaths during a full calendar year then another person of the 
same age outside the group is sure to die inside the calendar year. 

This is e^'idently a very strange result, and yet, working on 
the basis of the principle of insufficient reason, the mathematical 
deductions and formula exhibit no errors. 

Mr. Bing disposes of the whole matter by simply denj4ng the 
validity and existence of a posteriori probabilities. Dr. Kroman 
on the other hand defends Bayes's Rule. " Mathematics," 
Kroman says, " is — as Huxley has justly remarked — an ex- 
ceedingly fine null stone, but one must not expect to get wheat 
flour after ha%-ing put oats in the quern." According to the 
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Danish scholar the paradox is due to the use of a wrong formula. 
We ought to have used the general formula (II) instead of formula 
(V) which is a special case. In the general formula we encounter 
the functions u, denoting the probability existence of the various 
productive probabilities y. As we do not know anything about 
this function u it is hopeless to attempt a calculation. This 
brings the criticism down to the fundamental question whether 
we shall build the theory of probabilities on the principle of 
" cogent reason " or the principle of " insufficient reason." 

50. Conclusion. — Contradictory results of a similar kind to 
the ones given above have led several eminent mathematicians 
to a complete denunciation of the laws underlying a posteriori 
probabilities. Professor Chrystal, especially, becomes extremely 
severe in his criticism in the previously mentioned address before 
the Actuarial Society of Edinburgh. He advises " practical 
people like the actuaries, much though they may justly respect 
Laplace, not to air his weaknesses in their annual examinations. 
The indiscretions of a great man should be quietly allowed to be 
forgotten." Although one may heartily agree with Professor 
Chrystal's candid attack on the belief in authority, too often 
prevailing among mathematical students, I think — aside from 
the fact that the rule was originally given by Bayes — that the 
great French savant has been accused unjustly as the following 
remarks perhaps may tend to show. 

In our statement of Bayes's Rule, we followed an exact mathe- 
matical method, and the final formula (I) is theoretically as 
correct as any previously demonstrated in this work. The 
customary definition of a mathematical probability as the 
ratio of equally favorable to coordinated possible cases, is not 
done away with in this new kind of probabilities; the former are 
found in the numerator and the latter in the denominator; and 
if we take care that each of the particular formulas, with its 
definite requirements, is applied to its particular case, we do not 
go beyond pure mathematics or logic. But are we able to get 
complete and exact information about these requirements? In 
the example of the tossing of a coin with two heads, this informa- 
tion was at hand. Here we were able to enumerate exactly the 
different mutually exclusive causes from which the observed 
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event originated. We were also able to determine the exact 
quantitative measures for the probabilities, k, that these com- 
plexes existed as well as the different productive probabilities, w. 
Here the most rigid requirements could be satisfied, and the rule 
gave therefore a true answer. 

In the other examples we encountered a different state of 
affairs. Here we were not able to enumerate directly the dif- 
ferent complexes of causes from which the event originated, but 
were forced to form different and arbitrary hypotheses about the 
complexes of origin, F, and each hypothesis gave, in general, a 
different result. Furthermore, we assumed a priori that the 
different probabilities of the actual existence of the complexes 
were all equal in magnitude, and it was, therefore, the special 
formula (II) we employed in the determination of the a posteriori 
probabilities. In this formula, the different k's do not enter at 
all as a determining factor; only the productive probabilities, as, 
are considered. The assumption that all the k's are equal in 
magnitude is based upon the principle of insufficient reason, or 
as Boole calls it, " the equal distribution of ignorance." 

The principle of equal distribution of ignorance makes in the 
case of continuously varying productive probabilities, v, the 
function, u, of the probabilities of existence of the various 
complexes equal to a constant quantity. In other words, the 
curve in Fig. 1, is replaced by a straight line of the form, u = k. 
Now, as a matter of fact, we possess in most cases, some partial 
knowledge of the complexes of action producing the event in 
question. This partial knowledge — although far from complete 
enough to make a rigorous use of formula (I) — is nevertheless 
sufficient to justify us in discarding completely any general 
hypothesis assuming such simple conditions as above. Such 
partial knowledge is, for instance, found in the Paradox of Bing. 
Here the rather absurd hypothesis was made that the possible 
values of the probability of surviving a certain period were 
equally probable. In other words, it is equally probable that 
there will die 0, 1, 2, ■ ■ ■ , or s persons in the particular period. 
" Common sense, however, tells us that it is far more probable 
that, for instance, 90 per cent, of a large number of forty-year-old 
persons will survive the period than no one or every one will die 
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in the same period " (Kroman). The indiscreet use of formula 
(II) therefore naturally leads to paradoxical results. On the 
other hand, the fallacy of the happy-go-lucky computers, em- 
ploying the special case (II) of Bayes's Rule, as well as the critics 
of Laplace, lies in their failure to make a proper distinction 
between " equal distribution of ignorance " and " partial cogent 
reason," which latter expression properly may be termed " an 
unequal distribution of ignorance." If, despite the actual 
presence of such unequal distribution of ignorance, we still insist 
in using the special formula (II), which is only to be used in the 
case of an equal distribution of ignorance, it is no wonder we 
encounter ambiguous answers. Not the rule itself, its discoverer, 
or Laplace, but the indiscreet computer is the one to blame. 
Messrs. Bing, Venn and Chrystal, in their various criticisms, have 
filled the quern with some rather " wild oats " and expected to 
get wheat flour; and that one of those critics in his disappoint- 
ment in not getting the expected flour should blame Laplace, is 
hardly just. 

So much for the principle of " equal distribution of igno- 
rance." It may be of interest to see how matters turn out when 
we like von Kries insist upon the principle of " cogent reason " 
as the true basis of our computations. The reader will quite 
readily see that a rigorous application of the Rule of Bayes in its 
most general form as given by formula (I) really tacitly assumes 
this very principle. In formula (I), we require not alone an 
exact enumeration of the various complexes from which the 
observed event may originate, but also an exact and complete 
information about the structure of such complexes in order to 
evaluate their various probabilities of existence. If such informa- 
tion is present, we can meet even the most stringent requirements 
of the general formula, and we will get a correct answer. But 
in the vast majority of cases, not to say all cases, such information 
is not at hand, and any attempt to make a computation by means 
of Bayes's Rule must be regarded as hopeless. We may, how- 
ever, again remark that very seldom we are in complete ignorance 
of the conditions of the complexes, which is the same thing as 
saying that we are not in a position to employ the principle of 
equal distribution of ignorance in a rigorous manner. From 
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other experiments on the same kind of event, or from other 
sources, we may have attained some partial information, even if 
insufficient to employ the principle of cogent reason. Is such 
information now to be completely ignored in an attempt to give 
a reasonable, although approximate answer? It is but natiu-al 
that the mathematician should attempt to obtain as much of 
such information as possible and use it in the evaluation of the 
various probabilities of existence. Thus for instance, if, in the 
Paradox of Bing, we had observed that the probability of survival 
for a forty-year-old person never had been below .75 and never 
above .95, it would be but reasonable to substitute those limits 
in their proper integrals in order to attain an approximate answer. 
To illustrate this somewhat subjective determination of an a 
posteriori probability, we take another example from the memoirs 
of Bing and Kroman. 

Example (24)- — A merchant receives a cargo of 100,000 pieces 
of fruit. If every single fruit is untainted, the value of the cargo 
may be put at 10,000 Kroner. On the other hand, any part of 
the cargo more or less tainted is considered worthless. The 
merchant has never before received a similar cargo and does not 
know how the fruit has been afFected by travel. As samples, he 
has selected 30 pieces picked at random from the cargo and all 
samples proved to be fresh. He asks a mathematician what 
value he can put on the cargo. 

If the mathematician uses the special formula (II), assum- 
ing an equal distribution of ignorance, therefore assuming that 
it is equally probable that for example none, 5,000 or all the 
individual pieces of fruit were untainted, the answer is: 

10,000^^;! = 9687.5 Kroner. 

I 1^'^dv 

If we use the true rule, the a posteriori probability of the whole- 

someness of the cargo is given by the integral: 

•■1 



I 

Jo 



I 



1 
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where v is the general expression for a possible probability of 
wholesomeness between and 1 and udv the corresponding proba- 
bility of existence. Now if the mathematician has no complete 
information as to this particular function, ?t, it would be foolish 
cf him to attempt a calculation, since the hypothesis of an equal 
probability of existence for all possible values of v evidently 
gives an arbitrary and perhaps a very erroneous result. On 
the other hand, the computer may possibly have access to some 
partial information. Perhaps the merchant has received fruit 
of a similar kind or heard about cargoes of this particular kind 
of fruit received by other dealers. If now the merchant were 
able to inform the computer that in a great number of similar 
cases the probability of wholesomeness had been between 0.9 
and 1 with an approximately even distribution, while it never 
had been below 0.9, then nothing would hinder the mathematician 
to present the following computation: 



I V 

«/0.9 



I v^dv 

Jo.9 



- = 0.9726 



and tell the merchant that on the basis of the information given 
9,726 Kroner would be a fair price for the cargo. 

This is really the point of view taken by the English mathe- 
matician, Professor Karl Pearson, one of the ablest writers on 
mathematical statistics of the present time, when he says: "I 
start, as most Avriters on mathematics have done, with ' the 
equal distribution of ignorance ' or I assume the truth of Bayes's 
Theorem. I hold this theorem not as rigidly demonstrated, but 
I think with Edgeworth that the hypothesis of the equal dis- 
tribution of ignorance is, within the limits of practical life, 
justified by our experience of statistical ratios, which are unknown, 
i. e., such ratios do not tend to cluster markedly round any 
particular point." 

To sum up the above remarks: Theoretically Bayes's Rule is 
true. If we are able to enumerate and determine the probabilities 
of existence of the complexes of origin it will also give true 
results in practice. If we are justified in assuming the principle 
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of " insufficient reason " or " equal distribution of ignorance " 
as the basis for our calculations, formula (II) may be employed 
with exact results after a rigid enumeration of the complexes. 
If the principle of " cogent reason " is required as the basis, an 
exact computation is in general hopeless, and we can only after 
having obtained partial subjective information give an approxi- 
mate answer. 

With these remarks we shall conclude the elementary dis- 
cussion of the merely theoretical part of the siibject. The follow- 
ing chapters require in most cases a knowledge of the infinitesimal 
calculus, and many of the questions discussed above will appear 
in a new and instructive light by this treatment. 



CHAPTER VII. 

THE LAW OF LARGE NUMBERS. 

51. A Priori and Empirical Probabilities. — In the previous 
chapters we Hmited ourselves to the discussion of such mathe- 
matical probabilities, where we, a priori, on account of our 
knowledge of the various domains or complexes of actions, were 
able to enumerate the respective favorable and unfavorable 
possibilities associated with the occurrence or non-occurrence of 
the event in question. " The real importance of the theory of 
probability in regard to mass phenomena consists, however, 
in determining the mathematical relations of the various proba- 
bilities not in a deductive, but in an empirical manner — without an 
a priori exhaustive knowledge of the mutual relations and actions 
between cause and effect — by means of statistical enumeration 
of the frequency of the observed event. The conception of a 
probability finds its justification in the close relation between the 
mathematical probabilities and relative frequencies as determined in 
a purely empirical way. This relation is established by means 
of the famous Law of Large Numbers " (A. A. Tschuprow). 

To return to our original definition of a mathematical proba- 
bility as the ratio of the favorable to the coordinated equally 
possible cases, we first notice that this definition is wholly 
arbitrary like many mathematical definitions. The contention 
of Stuart Mill that every definition contains an axiom is rather 
far stretched. In mathematics a definition does not necessarily 
need to be metaphysical. A striking example is offered in 
mechanics by the definitions of force as given by Lagrange and 
Kirchhoff. What is force? " Force," Lagrange says, " is a 
cause which tends to produce motion." Kirchhoff on the other 
hand tells us that force is the product of mass and acceleration. 
Lagrange's definition is wholly metaphysical. Whenever a 
definition is to be of use in a purely exact science such as mathe- 
matics, it must teach us how to measure the particular phe- 
nomena which we are investigating. Thus, to quote Poincare, 

82 
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" it is not necessary that the definition tells us what force really 
is, whether it is a cause or the effect of motion." 

An analogous case is offered in the criticism of a mathematical 
probability as defined by Laplace, and the attempts to place 
the whole theory of probabilities on a purely empirical basis by 
Stuart Mill, Venn and Chrystal. These writers contend " that 
probability is not an attribute of any particular event happening 
on any particular occasion. Unless an event can happen, or 
be conceived to happen a great many times, there is no sense in 
speaking of its probability." The whole attack is directed against 
the definition of a mathematical probability in a single trial 
which definition, evidently by the empiricists, is regarded as 
having no sense. The word " sense " must evidently be con- 
sidered as having a purely metaphysical meaning. In the same 
manner Kirchhoff's definition might be dismissed as having no 
sense, since it would seem as difficult to conceive force as a purely 
mathematical product of two factors, mass and acceleration, as 
it is to conceive the definition of a mathematical probability 
as a ratio. 

The metaphysical trend of thought of the above writers is 
shown in their various definitions of the probability of an event. 
Mill defines it merely as the relative frequency of happenings 
inside a large number of trials, and Venn gives a similar defini- 
tion, while Chrystal gives the following: 

" If, on taking any very large number N out of a series of cases 
in which an event, E, is in question, E happens on pN occasions, 
the probability of the event, E, is said to be p." 

Let us, for a moment, look more closely into these statements. 
Any definition, if it bears its name rightly, must mean the same 
to all persons. Now, as a matter of fact, the vagueness in a 
half metaphorical term like " any very large number " illustrates 
its weakness. The question immediately confronts us " what is 
a very large number? " Is it 100, 1,000 or perhaps 1,000,000? 

A fixed universal standard for the value of N seems out of the 
question and the definition — although perhaps readily grasped 
in a " general way " — can hardly be said to be happily chosen. 

Another, and perfectly rigorous definition, is the following one 
given by the Danish astronomer and actuary, T. N. Thiele. 
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Thiele tells us that " common usage " has assigned the word 
probability as the name "for the limiting value of the relative 
frequency of an event, when the number of observations (trials), 
under which the event happens, approach infinity as a limit." 
A similar definition is later on given by the American actuary 
R. Henderson, who says : " The numerical measure which has been 
universally adopted for the probability of an event under given 
circumstances is the ultimate value, as the number of cases is 
indefinitely increased, of the ratio of the number of times the 
event happens under those circumstances to the total possible 
number of times." There is nothing ambiguous or vague in these 
definitions. Infinity, taken in a purely quantitative sense, has a 
perfectly uniform meaning in mathematics. The new definition 
differs, however, radically from our customary definition of a 
mathematical a priori probability. We cannot, therefore, agree 
with Mr. Henderson when he continues " the measure there given 
has been universally adopted and this holds true in spite of the 
fact that the rule has been stated in ways which on their face differ 
widely from that above given. The one most commonly given 
is that if an event can happen in a ways and fail in b ways all of 
which are equally likely, the probability of the event is the ratio 
of a to the sum of a and h. It is readily seen that if we read 
into this statement the meaning of the words " equally likely," this 
measure, so far as it goes, reduces to a particular case of that given 
above." 

In order to investigate this statement somewhat more closely, 
let us try to measure the probability of throwing head with an 
ordinary coin by both our old definition of a mathematical 
probability and the definition by Mr. Henderson of what we 
shall term an empirical probability. Denoting the first kind of 
probability by P(E) and the second by P'{E) we have in ordinary 
symbols 

P(E) = I 

P'iE) = lim F{E, v) 

where the symbol F{E, v) denotes the relative frequency of the 
event, E, in v total trials. No a priori knowledge will tell us 
offhand if P'(E) will approach ^ as its ultimate value. The 
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two methods are radically different. By the first method the 
determination of the numerical measure of a probability depends 
simply on our ability to judge and segregate the equally possible 
cases into cases favorable and unfavorable to the event E. By 
the second method the determination of the probability depends, 
not alone on the segregation and consequent enumeration of the 
favorable from the total cases, but chiefly on the extent of our 
observations or trials on the event in question. 

52. Extent and Usage of Both Methods. — Before entering into 
a more detailed discussion of the actual quantitative comparison 
of the two methods, it might be of use to compare their A'arious 
extent of usage. In this respect the empirical method is vastly 
superior to the a priori. A rigorous application of the a priori 
method, as far as concrete problems go, is limited to simple 
games of chance. As soon as we begin to tackle sociological or 
economical practical problems it leaves us in a helpless state. 
If we were to ask about the probability that a certain person 
forty years of age would die inside a year, it would be of little use 
to try to determine this in an a priori manner. Even a purely 
deductive process, as illustrated by Bayes's Rule in the earlier 
chapters, leads to paradoxical results. Our a priori knowledge 
of the complexes of causes governing death or survival is so 
incomplete that even a qualitative — not to speak of a quanti- 
tative — judgment is out of the question. The empirical method 
shows us at least a way to obtain a measure for the probabihty 
of the event in question. By observing during a period of a year 
an infinite number of forty-year-old persons of whom, after an 
exhaustive qualitative investigation, we are led to believe that 
their present conditions as far as health, social occupation, en- 
vironments, etc., are concerned are equally similar, we may by 
an enumeration of those who died during the year obtain the 
desired ratio as defined by P'{E). Of course, observation 
an infinite number is practically impossible. An approximate 
ratio may be formed by taking a finite, but a large, number 
of cases under observation. But how large a number? This 
very question leads straightforward to another problem, namely 
the quantitative determination of the range of variance between 
the approximate ratio and the ideal ultimate ratio as defined by 
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the relation 

P'iE) = lim FiE, v). 

Since it is impossible to make an infinite number of observations 
we cannot find the exact value of the range of such variations. 
But we may, however, determine the probability that this range 
does not exceed a certain fixed quantity, say X, in absolute mag- 
nitude. Stated in compact form our problem reduces to the 
following form: To determine the probability of the existence 
of the following inequality : 



hm FiE, v)-- 



^X 



where both a and s are finite numbers. This, to a certain extent, 
contains in a nut shell some of the most important problems in 
probabilities. 

The above problem may be solved in two distinct ways. The 
first, and perhaps the most logical way, is by a direct process. 
This is the method followed by T. N. Thiele in his " Almindelig 
lagttagelseslsere,"^ published in Copenhagen, 1889, a most 
original work, which moves along wholly novel lines. Thiele 
distinguishes between (1) Actual observation series as recorded 
from observation, in other words statistical data. (2) Theoret- 
ical observation series giving the conclusions as to the outcome of 
future observations and (3) Methodical laws of series where the 
number of observations is increased indefinitely. By such a 
process, purely a theory of observations, the whole theory of 
probability becomes of secondary importance and rests wholly 
upon the theory of observed series, a fact thoroughly emphasized 
by Thiele himself. When the author first, in the closing chapters 
of his book, makes use of the word probability it is only because 
" common usage " has assigned this word as the name for the 
ultimate frequency ratio designated by our symbol lim F{E, v). 

r=oo 

The problem may, however, be solved in an indirect way, 
which is the one I shall adopt. This method, as first consistently 
deduced by Laplace, has for its basis our original definition of a 
mathematical a priori probability and may be briefly sketched as 
follows: We first of all postulate the existence of an a priori 

'English edition, "Theory of Observations," London, 1905. 
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pirobability as defined, although its actual determination, by a 
priori knowledge, is impossible except in a few cases, as, for 
instance, simple games of chance, drawing balls from urns, etc. 
Denoting such a probability by P{E), or p, we next ask. What will 
be the expected number, say a, of actual happenings of the event, 
E, expressed in terms of s and p, when we make s consecutive 
trials instead of a single trial, and what will be the number of 
happenings of E when s approaches infinity as its ultimate value? 
If such a relation is found between p, a and s, where p is the 
unknown quantity, we have also found a means of determining 
the value of p in known quantities. Our next question is — 
What is the probability that the absolute value of the difference 
between p and the relative frequency of the event as expressed 
by the ratio of a to 5 does not exceed a previously assigned 
quantity? Or the probability that 



a 



X? 



Now, as the reader will see later, we shall prove that 
lim F{E, ■») = P{E) = p. 

V=QO 

It must, however, be remembered that this result is reached by a 
mathematical deduction, based upon the postulate of mathe- 
matical probabilities, and not in the manner as suggested in the 
above statement by Mr. Henderson. 

It is only after having established such purely quantitative 
relations that we are entitled to extend the laws of mathematical 
probabilities as deduced in the earlier chapters to other problems 
than the simple problems of games of chance. 

53. Average a Priori Probabilities. — In the previous para- 
graphs of this chapter, another important matter is to be noted, 
namely the assumption that the complex of causes producing 
the event in question remains constant during the repeated 
trials (observations), or, stated in other words the mathematical 
a priori probability remains constant. Under this limitation 
the extension of the laws of mathematical probabilities would 
have but a very limited practical application. In all statistical 
mass phenomena such an ideal state of affairs is rather a very 
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rare exception. If we consider an ordinary mortality investiga- 
tion we know with absolute certainty that no two persons are 
identically alike as far as health, occupation, environment and 
numerous other things are concerned. Thus the postulated 
mathematical probability for death or survival during a whole 
calendar year will in general be different for each person. We 
may, however, conceive an average probability of survival for a 
full year defined by the relation 

Pi + P2 + Ps + ■ ■ ■ Pb 2p 

P' = S = T' 

where pi, pi, pz, ■ ■ ■ are the postulated probabilities of each 
individual under observation. Our task is now to find: 

1. An algebraic relation between the average probability as 
defined above, the absolute frequency a and the total number of 
observations (trials) s, 

2. The same relation when s approaches a as its ultimate value, 

3. The probability of the existence of the inequality, 



a 
Po 



sx, 



s 

where a denotes the absolute frequency of the occurrence of the 
event, s the total number of observations (trials) and X an ar- 
bitrary constant. 

54. The Theory of Dispersion. — As we mentioned before the 
empirical ratio a/s represents only an approximation of the ideal 
ultimate value of lim F{E, v). If we now make a series of 

V=00 

observations (trials) on the occurrence of a certain event E, such 
that instead of a single set of observations of s individual ob- 
servations we take N such sets, we shall have N relative frequency 
ratios : 

Ol 02 «3 Oj?- 

s ' s ' s'' '" s ' 

Since the ratios are approximations only of the ultimate ratio 
they will in general exhibit discrepancies as to their numerical 
values and may be regarded as N diff'erent empirical approxima- 
tions. The question now arises how these various empirical 
ratios group themselves around the value of lim F{E, v). The dis- 
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tribution of the empirical ratios around the ultimate ratio is by 
Lexis called " dispersion." 

55. Historical Development of the Law of Large Numbers. — 

The first mathematician to investigate the problems we have 
roughly outlined in the previous paragraphs was the renowned 
Jacob Bernoulli in the classic, " Ars Conjectandi," which rightly 
may be classified as one of the most important contributions on 
the subject. Bernoulli's researches culminate in the theorem 
which bears his name and forms the corner-stone of modern 
mathematical statistics. That Bernoulli fully realized the great 
practical importance of these investigations is proven by the 
heading of the fourth part of his book which runs as follows: 
" Artis Conjectandi Pars Quarta, tradens usum et applicationem 
prsecedentis doctrinse in civilibus et ceconomicis." It is also 
here that we first encounter the terms " a priori " and " a pos- 
teriori " probabilities. Bernoulli's researches were limited to 
such cases where the a priori probabilities remained constant 
during the series or the whole sets of series of observations. 
Poisson, a French mathematician, treated later in a series of 
memoirs the more general case where the a priori probabilities 
varied with each indi-\'idual trial. He also introduced the technical 
term, " Law of Large Numbers " (" Loi des Grand Xombres "). 
Finally Lexis through the publication in 1877 of his brochure, 
" Zur Theorie der Massenerscheinungen der menschlichen Gesell- 
schaft," treated the dispersion theory and forged the closing 
link of the chain connecting the theory of a priori probabilities 
and empirical frequency ratios. Of late years the Russian mathe- 
matician, Tchebycheff, the Scandina^^an statisticians, Wester- 
gaard and Charlier, and the Italian scholar, Pizetti, have con- 
tributed several important papers. It is on the basis of these 
papers that the following mathematical treatment is founded. 
In certain cases, however, we shall not attempt to enter too 
deeply into the theory of certain definite integrals, which is 
essential for a rigorous mathematical analysis, but which also 
requires an extensive mathematical knowledge which many of 
my readers, perhaps, do not possess. To readers interested in 
the analysis of the various integrals we may refer to the original 
works of Czuber and Charlier. 



CHAPTER VIII. 

INTRODUCTORY FORMULAS FROM THE INFINITESIMAL 
CALCULUS. 

56. Special Integrals. — In the following chapters we shall 
attempt to investigate the theor\' of probabilities from the stand- 
point of the calculus. Although a knowledge of the elements 
of this branch of mathematics is presupposed to be possessed 
by the student, we shall for the sake of convenience briefly 
review and demonstrate a few formulas from the higher analysis 
of which we shall make frequent use in the following paragraphs. 
All such formulas have been given in the elementary instruction 
of the calculus, and only such readers who do not have this 
particular branch of mathematics fresh in memory from their 
school days need pay any serious attention to the first few 
paragraphs. 

57. Wallis's Expression for tt as an Infinite Product. — We wish 
first of all to determine the value of the definite integral: 

Jn = qJ sin" xdx, (1) 

under the assumption that w is a positive integral number. This 
integral is geometrically equal to the area between the x axis, 
the axis of y, the ordinate corresponding to the abscissa \ir and 
the graph of the function y = sin" x. Letting u' = DxU = sin x, 
V = sin"~^ X, we get by partial integration : 

J„ = — cos X sin"~^ a; r ^+ gf' '^ cos x{n— 1) sin"~^ x cos xdx. (2) 

If we substitute the upper and lower Hmits in the first term on 
the right hand side of the above expression for J„ this term 
reduces to 0, assuming « > 1. Thus we have: 

Jn = (n — l)oJ" sm^~^ x-cos,^ xdx. 
90 
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Putting cos^ X = 1 — sin^ x, we get: 

Jn= {n- l)J'^' sin"-2 xdx - {n - 1) J""" sin" xdx. (3) 

The last integral is, however, equal to J„ and the first integral 
is, following the notation from (1), equal to /n-2. ^Ve shall 
therefore have: 

Jn+ (n — l)Jn = (n — l)Jn-2, 



or 



nJn= {n- 1)J^2- (4) 

Replacing nhy n — 1, n — 2, n — 3, •■• successively we get: 

nJn = (n — l)Jr^2, 
{n — 1) J,^_l = (W — 2)Jn-3, 
{n — 2)Jn-2 = (n — 3)Jn-i, 



According as n is even or uneven we shall have one of the 
following equations at the bottom of the recursion formula: 

Jo = o/" ^ sin" xdx = J^' ^ dx = ^ir, 
or 

Ji = ^J smxdx = — cos a; " = 1. (5) 

If, for even values of n, we let n = 2m, and, for uneven values, 
n = 2m — 1, we get finally the following recursion formulas: 

2TnJ2m= (2m— l)J2v>-2, (2m— 1) J2m-1= (2m— 2) J2m-3, 

(2m — 2) J2m-2 = (2m— 3)J2m^i, (2m— B)J2m^3= (2m— 4) J2m-Si, 

2J2 = l-k, 3^3 = 2X1. 

Successive multiplication of the above equations gives us 

finally: 

(2m- l)(2m- 3)---l ir 
*'''"" 2m(2m-2)---2 ^2' 

_ (2m- 2)(2m- 4)- ••2 *■ ^ 

Jtm-i - (2m - l)(2m - 3) • ■ -3 ■ 

We may now draw some very interesting conclusions from the 



92 FOBMULAS FEOM THE INFINITESIMAL CALCULUS. [58 

above equations. Both integrals represent geometrically areas 
bounded by the graphs of the functions: 

y = sin^" X and y = sin^""^ x respectively. 

The difference of the ordinates of these graphs, namely: 

(sin X — 1) sin^""' x 

is evidently decreasing with increasing values of the positive 
integer n, since sin x lies between and + 1 and sin^"*"^ x ap- 
proaches the value except for certain values of x. The larger 
we select m the less is the difference of the two areas and the 
ratio will therefore approach 1, or the expression 

(2m- 2)(2w - 4)- ■■2 (2m- l)(2m - 3) •■•3 _ t 
(2m- l)(2m - 3)---3 ^ 2m(2m-2)---2 ~2" 

Hence: 

IT ,. 22-42-62---(2m- 2)2.2m 

= lim 



2 ;r;i2.32.52-.-(2m- 3)2(2m- 1)2* 

Multiplying with 22-42-62- • ■ (2m - 2)^ we get: 

T ,. 2^"'-3m[(m - 1)/]^ ,. 22"'(m/)2 PT" 

K = hm — pTT, -xw> — or lim =:^ = ^ir 2. 

2 m=« [(2m - l).f ^=„ (2m/) M2m 

This is the formula originally discovered by the English 
mathematician, John Wallis (1616-1703), and by means of which 
IT may be expressed as an infinite product. 

58. De Moivre — Stirling's Formula. — We are now in a position 
to give a demonstration of Stirling's formula for the approximate 
value of n! for large values of n. A. de Moivre seems to have 
been the first to attempt this approximation. In the first edition 
of his "Doctrine of Chances" (1718) he reaches a result, which 
must be regarded as final, except for the determination of an 
unknown constant factor. Stirling succeeded in completing this 
last step in his remarkable "Methodus Differentialis " (1738). 
In the second edition of " Doctrine of Chances" (1738) de Moivre 
gives the complete formula with full credit to Stirling. He 
mentions as his belief that Stirling in his final calculation possibly 
has made use of the formula of Wallis. The demonstration by 
the older English authors is rather lengthy and much shorter 
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methods have been de\'ised by later writers. ^lost authors 
make use of the Eulerian integral of the second order by which 
any factorial may be expressed by a gamma fimction: 



r(n + 1) = J^x'^e-^'dx 



m. 



Another method makes use of the well-known Euler's Summation 
Formula from the calculus of finite differences. This method is 
of special interest to actuarial students, who frequently use the 
Eulerian formula in the computation of various life contingencies. 
For the benefit of those interested in this particular method we 
may refer to the treatises of Seliwanoff and ]\Iarkhoff, two 
Russian mathematicians.^ 

The Italian mathematician, Cesaro, has, however, derived 
the formula in a much simpler manner.^ 

Cesaro starts with the inequalities: 



(-^r 






From a well-known theorem from logarithms we have : 
1, «+ 1 1 ,1,1, 



2n + 1 ' 3(2?! + \f ' 5(2n + 1)* ' 
which also may be written as follows: 

y=(n+i)log.(l + J)=l + 3^2„Vl)^+5(2n+l)*+---- 

If all the coefficients 3, 5, • • • are replaced by the number 3, 
we obtain a geometrical series. The summation of this infinite 
series shows that 

1< .V < 1 + ^ 



or 



I2n(n + 1)' 

. n+l '2 , , 1 



If we let 



/ l\n+l/2 1 



_ w!e" _ (n + 1) ! e"+i 



(n + l)"+3'2' 

1 Seliwanoff, "LeJirbuch der Differenzenrechnung," Leipzig, 1905, pages 
59-60; Markhoff, "Differenzenreclmung," Leipzig, 1898. 

2 Cesaro, "Corso di analisa algebrica," Torino, 1884, pages 270 and 480. 
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then 

_M^^ (1 + llnY+^'^ 
Un+i e 

Dividing the quantities in (I) by e we have: 



Un 



1<— ^< 6"""'+". (II) 

Un+1 



The exponent of e may be written as follows: 
1 1 1 



12nin + 1) 12n 12{n + 1) " 

IVIaking use of this relation (II) may be written in the following 
form: 

__!_ 1 

12ra 



Denoting the quantity : Un ■ e~^ 'i^" by uj, we shall have two mon- 
otone number sequences: 

Ml, M2, Ms, • • • Un, Un+1, ' ' ' , 
Ml', W, Ms',- • -Un', Un+\,' ■ ■ ■■ 

These two sequences show some very remarkable features. 
With increasing values of n the values of m„ decrease, or the 
sequence is a monotone decreasing number sequence. The 
values of Un become larger when n is increased and form there- 
fore a monotone increasing number sequence. But any member 
of this latter series satisfies, however, the inequality 

Since both number sequences are situated in a finite Interval 
it follows from the well-known theorem of Weierstrass that they 
both have a clustering point, i. e., a point in whose immediate 
region an infinite number of points of the sequence are located. 
Denoting this point of cluster by a, we have here an increasing 
and a decreasing monotone sequence which both converge 
towards a, or: 

lim Un = lim m„ = a. 

n=oo w=oo 

This relation may be illustrated by the accompanying diagram : 
If we now let lim m„ = lim M„-e~''^^" = a, then we shall have 
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for every finite value of n: 



Un-e 



-l/12n 



< a < w„. 



where a = iin-e~^'^" (0 < 6 < 1). 
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or n^^ ^ 



This gives us finally the following expression for n!: 



(Ill) 



In this expression we need only determine the unknown 
coeflBeient a. The formula of Wallis gives immediately: 

(2-4-6---2n )^ 2'-{niy j—~ 

lim i=— = nm -;=^= -VTr/z. 

»=«. (2n)!-V2w «=«.(2n)!V2w 

Substituting in this latter expression the value for factorials 
as found in (III) and neglecting the quantity: d/12n, we have 
after a few reductions : 



an 



lim , 

«=«. V27i(2m) 



= V7r/2, or o = A/2ir, 



from which we easily obtain De Moivre-Stirling's Formula in its 
final form: 

nl = V2^-n"+i/2-e-". 

This remarkable approximation formula gives even for com- 
paratively small values of n surprisingly accurate results. Thus 
for instance we have: 

10! = 3,628,800; lOi^e-^" V20^ = 3,598,699. 



CHAPTER IX. 

LAW OF LARGE NUMBERS. MATHEMATICAL DEDUCTION. 

5Q. Repeated Trials. — Let us consider a general domain of 
action wherein the determining causes remain constant and 
produce either one or the other of the opposite and mutually 
exclusive events, E and E, with the respective a priori prob- 
abilities p and q (,q = 1 — p)ma single trial. The trial (observa- 
tion) will, however, be repeated s times with the explicit assump- 
tion that the outward conditions influencing the different trials 
remain unaltered during each observation. The simplest ex- 
ample of observations of this kind is offered by repeated drawings 
of balls from an urn containing white and black balls only, and 
where the ball is put back in the urn and mixed thoroughly with 
the rest before the next drawing takes place. We keep now a 
record of the repetitions of the opposite events, E and E during 
the s trials, irrespective of the order in which these two events 
may happen. This record must necessarily be of one of the 
following forms : 

E happens s times, E times, 

E " s-1 " El " 

E " s-2 " E2 " 



E " " Es 



In Chapter IV, Example 17, we showed that the probabilities 
of the above combinations of the two events, E and E, were 
determined by the expansion of the binomial 

(p + qY- 

96 
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The general term 

is the probabUity P{E°-E^) that E will happen a and E /3 times 
in the s total trials. Each separate term of the binomial expan- 
sion of (p + qY, represents the probability of the happening of 
the two events in the order given in the above scheme. 

60. Most Probable Value. — In dealing with these various 
terms, it has usually been the custom of the English and French 
mathematicians as well as many German scholars to pay par- 
ticular attention to a special term, the maximum term, which 
generally is known as the "most probable value" or the "mode." 
Russian and Scandinavian writers and the followers of the Lexis 
statistical school of Germany have preferred to make another 
quantity known as the "probable" or "expected value," the 
nucleus of their investigations. Although it is our intention to 
follow the latter method, we shall discuss first, briefly, the most 
probable value. Two questions are then of special interest 
to us: 

(1) What particular event is most probable to happen? 

(2) What is the probability that an event will occur whose 
probability does not differ from that of the most probable event 
by more than a previously fixed quantity? 

Neither of the two questions offers any particular principal 
difficulties from a theoretical point of view. When regarding 
the probability P{E°'&), which we shall denote by T, as a func- 
tion of the variable quantity, a, T evidently will reach a maximum 
value for a certain value of a, (/3 = s — a), and we need only 
determine the greatest term in the above binomial expansion. 

In order to answer the second question we have only to pick 
out all the terms which are situated between the two fixed limits. 
Their sum is then the probability that those two limits are not 
exceeded. 

61. Simple Numerical Examples. — When s is a comparatively 
small number the actual expansion may be performed by simple 
arithmetic. We shall, for the benefit of the student, give a 
simple example of this kind. 

8 
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A pair of dice is thrown 4 times in succession, to investigate 
the chance of throwing doublets. 

In a single throw the probability of getting a doublet is 

P — 'a'>{ 9 = 'a) ■ Expanding I ^ + ^ ) by means of the bi- 
nominal theorem we get l^j + i l-j l-j + qI-J l^j 

"'"^(fi/lfi) "'"IfiJ" ^^^^ of t^6 above terms represents 

the probability of the occurrence of the various combinations of 
doublets (E) and non-doublets (E), and it is readily seen that 
the event of getting no doublets at all, represented by the 

last term ( ^ ) = 0.4823, has the greatest probability. In other 

'i\\)rds it is the most probable event. 

Let us next repeat the trial 12 times instead of 4. The 13 
possible probabilities for the various combinations of doublets 
and non-doublets will then be expressed by the respective terms 
in the expression 



il-IJ 



The 13 members have as their common denominator the quantity 
2,176,782,336 and as numerators the following quantities: 1, 60, 
1,650, 27,500, 309,375, 2,475,000, 14,437,500, 61,875,000, 193,- 
359,375, 429,687,500, 644,531,250, 585,937,500, 244,140,625, 
which now shows that the most probable combination is the one 
of 2 doublets and 10 non-doublets, having a numerical value 
equal to .2961. 

A further comparison will show that the most probable 
event in the second series had the probability .2961, whereas 
.4823 was its value in the first series. In other words, the prob- 
ability decreases when the trials (observations) are increased. 
This is due to the fact that the total number of possible cases 
becomes large with the increase of experiments. 

Another question which presents itself, in this connection, 
is the following : What is the probability that an event will occur 
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whose probability does not differ from the most probable value by 
more than a previously fixed quantity? Let us suppose we were 
asked to determine the probability that a doublet does not occur 
oftener than 5 times and not less than 1 time in 12 trials. This 
probability is found by adding the numerical values of the prob- 
abilities as given in the binomial expansion from the term 

containing p = ^ to the power 6, to p to the first power or 

14,437,500 + 6,187,500 + 193,359,375 + 429,687,500 

+ 644,531,250 + 585,937,500 

2,176,782,336 

62. The Most Probable Value in a Series of Repeated Trials. 

— In the examples just given we determined the probability for 
the happening of the most probable event in a series of s observa- 
tions by a direct expansion of the binomial {j) -{-qY- This 
may be done whenever 5 is a comparatively small number. But, 
when s takes on large values, this method becomes impracticable, 
not to say impossible. Suppose that s = 1,400, then the actual 
straightforward expansion (p + 5)""" would require a tremen- 
dous work of calculation which no practical computer would be 
willing to undertake. We must therefore in some way or other 
seek a method of approximation by which this labor of calcula- 
tion may be avoided and try to find an approximate formula by 
which we are able to express the maximum term in a simple 
manner, involving little computation and at the same time 
yielding results close enough for practical as well as theoretical 
purposes. Jacob Bernoulli in his famous treatise "Ars Conjec- 
tandi" was the first mathematician to solve this problem. 
Bernoulli also gave an expression for the probability that the 
departure from the most probable value should not exceed pre- 
A'iously fixed limits. The method, however, was very laborious 
and the final form was first reached by Laplace in " Th6orie des 
Probabilites." 

We saw before in Chapter IV that the general term 
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in the binomial expansion (j> + q)° represented the probability 
that an event, E, will happen a times and fail (3 times in s trials, 
where p and q were the respective probabilities for success and 
failure in a single trial. The exponent a may here take all posi- 
tive integral values in the interval (0, s), including both limits. 
The question now arises, which particular value of a, say a„, 
will make the above quantity a maximum term in the expansion 
of the binomial? If a„ really is this particular value, then it 
must satisfy the following inequalities: 

(a„+l)!(^„- 1)^' ^ ^ajpnl^ * 
(I) (11) 

sl 
= («„- l)!(/3„+l)I^ ^ • 
(III) 

Dividing (II) by (III) and (II) by (I) we obtain the following 
inequalities : 

an q - Pn p- 

which also may be written 

i^H + l)p ^ qcxn and («„ + l)q ^ /3„p. 

The following reductions are self evident: 

{s — an+ l)p ^ anil — p) or sp + P ^ ««> 
and 
(a„ + l)g ^ (s — an)p or a„g + a„p ~^ sp — q oy an '^ sp — q. 

From which we see that «„ satisfies the following relation : 

ps — q ^ an ^ ps -\- p. 

Since p -\- q = 1, we notice that «„ is enclosed between two 
limits whose difFerence in absolute magnitude equals unity. 
The whole interval in which a„ is situated being equal to unity, 
and since a„ must be an integral number, this particular a„ is 
determined uniquely as an integral positive number when both 
ps — q and ps + p are fractional quantities. If ps — q is an 
integral number ps + p will also be integral, and a„ had to be a 
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fractional number in order to satisfy the above inequality. 
Since by the nature of the problem a„ can take positive integral 
values only, the binomial expansion of (p -\- qY must have two 
terms which are greater than any of the rest. Dividing both 
sides of the inequality by s, we shall have 

Since both p and q are proper fractions, both p/s and q/s are less 
than 1/s. We may therefore safely assume that the highest pos- 
sible difference between the two quotients ajs and /3„/s and the 
probabilities p and q will never exceed 1/s. Now if s is a very 
large number this quantity may be neglected, and we may 
therefore '«Tite ps = an and qs = /3„. 

Substituting these values in our original expression for the 
general term of the binomial expansion we get as the maximum 
number: 

^- {sp)lisq)\P ^ • 

63. Approximate Calculation of the Maximum Term, T,^. — 
\\Tien the trials are repeated a large number of times the straight- 
forward calculation of the maximum term becomes very laborious. 
The onlj' table facilitating an exact computation is in a work 
"Tabularum ad Faciliorem et Breviorem Probabilitatis Com- 
putationem Utilium, Enneas," by the Danish mathematician, 
C. F. Degen. This table, which was published in 1824, gives the 
logarithms to twelve places for all values of nl from w = 1 to 
n = 1,200. Degen's table is, however, not easily obtained, and 
even if it were, it would be of little or no value for factorials 
above 1,200 !. Our only resort is therefore to find an approximate 
expression for the above value of n I. This is most conveniently 
done by making use of Stirling's formula for factorials of high 
orders. We have 

si = s^^'^e-'^, 

isp)l = {spY^^'^e-'P^, 

{sq)l = (s5)"+i/2e-»«V2^. 
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Substituting the above values in the expression s//((sp)/ (sq)!) 
we get 



1 



Hence we have 



which reduces to 



psjH-i i2q,q+i 12 ^2irs 






p8p+ll2q8q+ll2^2ws' 

1 



T = 



Vl> 



Tspq 



as an approximate value for the maximum term. 

Tchehycheff's Theorems. — Despite all that has been said about 
the most probable value, its use is somewhat limited, and it 
might well, without harm, be left out of the whole theory of 
probabilities. Just because an event is the most probable it 
does by no means follow it is a very probable event. In fact the 
expression ( '^2-n-spq)~^ which for large values of s converges 
towards zero shows that the most probable event in reality is a 
very improbable event. This statement may seem a little 
paradoxical; but it is easily understood by realizing that the 
most probable event is only a probability for a possible combina- 
tion among a large number of equally possible combinations of a 
different order. 

Instead of finding the most probable event it is more important 
in practical calculations to determine the average number or 
mean value of the absolute frequencies of successes. In Chapter 
V we pointed out the close relation between a mathematical 
expectation and the mean value of a variable. This relation is 
used by the Russian mathematician, Tchebycheff, as the basis 
of some very general and far-reaching theorems in probabilities, 
by means of which the Law of Large Numbers may be established 
in an elegant and elementary manner. 

64. Expected or Probable Value. — In Chapter V we defined 
the product of a certain sum, s, and the probabihty of winning 
such a sum as the mathematical expectation of s. It is, however, 
not necessary to associate the happening of the e-\'ent with a 
monetary gain or loss, in fact it serves often to confuse the 
reader and we may generalize the definition as follows. If a 
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variable a i may assume any of the values ai, ai, as • • • a^ each, with 
a respective probability of existence <p{ai) (i = 1, 2 ■ • • s) and such 
that 'Zipioii) = 1, then we define: 

Zaiip{ai) = e{ai) 

as the expected value of ai. 

Some writers use also the term probable value instead of 
expected value. In other words the expected value of a variable 
quantity, a, which may assume any one of the values ai, a2- ■ •«» 
is the sum of the products of each individual value of the variable 
and the corresponding probability of existence of such value. 

Suppose we now have two opposite and complementary events 
E and E for which the probabilities of happening in a single 
trial are equal to p and q = 1 — p respectively. When the 
trials are repeated s times the probabilities of E happening s 
times, E no times, of E happening s — 1 and E once, oi E s — 2 
and E 2 times and so on, may be expressed by the individual 
terms of the expansion: 

(p + qY, 
where the general term expressing the occurrence oi E a times 
and of -E (s — a) times is: 



^(a) = (j: 



which is also the piobability of the existence of the frequency 
number a. The variable in the binomial expansion is a, which 
may assume all values from to * inclusive. 

We now first of all proceed to find the expected value — or the 
mathematical expectation — of the following quantities: 

a, [a — e(a)] and [a — e{a)f. 

We shall presently show the reason for the selection of the 
above expressions, which perhaps may appear at the present, 
somewhat puzzling to the student. 

In mathematical symbols the expected values of the above 
quantities are expressed as follows- 

e{a) = Sav'(«). «[« — e(a)] = 2[a — e{a)]<p(a) 
and 

e[a — e{a)Y = 2[a — e(a)Y<p{a) 

and the summation is to take place from a = and to a = s. 
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65. Summation Method of Laplace. The Mean Error. — 

The analytical difficulty lies in the summation of the expressions 
as given above. Laplace was the first to give a compact expres- 
sion for the different sums in a simple and elegant manner. 
By the introduction of the parameter t Laplace writes: 



<p(a) = (p+f/)' = 2; ^, \rq 



'K:y 



,s^a. 



as 



-(:) 



<p{ta) = {tv+qY = ^y^j {tvYq'-\ 

Differentiating with respect to t, which it must be remembered is 
introduced as an auxiliary parameter only, we have: 

<p'{ta) = spitp + ?)'-' = 2a/j yj {tpY-^q'-\ 

Letting t assume the special value 1 the above sum becomes e (a) 
or 

e(a) =Sa y j p^q'-" = sp{p + q)''^ = sp. (L) 

We might, however, have obtained the same result in a much 
shorter manner by the following consideration. The expecta- 
tion for a single event among the s events is equal to p. Since 
all the events are independent of each other, it follows from the 
addition theorem that the complete expectation of the total s 
cases is equal to sp. 

We next proceed to determine the expression: e[a — e{a)] or 
the expected value of the differences between the constant, 
e{a) = sp and the individual values 1, 2, 3, ■ ■ ■, s which a may 
assume in the binomial expansion. 

The difference a — e{a) is known as the departure or devia- 
tion from the expected value, some of these deviations will be 
positive, namely all the values situated to the right of the maxi- 
mum term, which also is the most probable term in the expansion 
{p + q)', while the as situated to the left of the maximum value 
of a will be less in magnitude than the largest a == sp and the 
deviation will therefore be negative. On account of the sym- 
metrical form of the binomial expansion we may expect an 
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equal number of positive and negative deviations which, taken 
two and two at a time, are equal in absolute magnitude. The 
algebraic sum of all the deviations may therefore be expected 
to be equal to zero. We shall, however, in a rigidly analytical 
manner prove that this is actually so. We have 

e[a — e{a)] = 2[q; — e{a)]<p{a) = 2a^(a) — Se(a)^(a) 

= 'Za<p(a) — spZ(p{a). 

The first term in this last expression we found, however, to be 
equal to e{a) = sp, and we have finally: 

e[a — e{a)] = sp — sp = 0. 

By squaring the quantity, a — e{a), we get a^ — 2ae{a) + 
[e(a)]^, which is always positive no matter if the above difference 
is negative. 

As a preliminary step we shall find 

e{a^) = 2aV(a)- 
Introducing the auxiliary parameter, t, we get: 

2(*)(«2')"9°"°= {tp+qY. 
The first derivative with respect to < is: 

2pa ( * j {tpY~\'~'' = sp{tp + g)»-i. 
Multiplying both sides of the equation by tp, we have: 

llpaiipYq^'^ ( a ) "" ■^^F^'-iP + 9)""^- 
Differentiating we get : 

•ZpW ( * \ (tpY'^q'-' = sfitp + g)'-! + s{s - l)pH{tp + q)^. 

Dividing through with the constant factor p and letting t = 1 
we have: 

1,0? ( * ) p"?""" = s'^p'^ + sp{\ — p) = s^p^ + spq. 

The expression on the left side is, however, nothing less than the 
algebraic sum of "La^ipia) or simply e{a^). This leaves the final 
result: 
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e{o?) = s^p^ + spq. 
We have now: 

[a — e{a)f = o? — 2ae{a) + [e(«)]^ 

from which it follows: 

e\a — e{a)f = s^p^ + spq — 2s^p^ + 5^ = spq. 

Denoting this latter quantity by the symbol [eCa)]^ we have: 

[e(a:)]^ = e[(x — eia)]^ = spq, or i{a) = yspq. (II.) 

The quantity «(«) or simply e is commonly known as the mean 
error of the frequency number a in the Bernoullian expansion. 
The mean error is one of the most useful functions in the theory 
of probabilities and furnishes one of the most powerful tools of 
the statistician. 

66. Mean Error of Various Algebraic Expressions. — We next 
proceed to prove some general theorems connected with the 
mean error. The mean error of the sum of two observed vari- 
ables, a and /3, is given by the formula : 

€(a + 13) = Moi) + e=(,3). 

Proof: Let e{a) = i:a<pia) and e(/3) = 'Efi^PilS) 

e\a) = S[a - e(a)]V(a) and e^fi) = S[j3 - e(/3)]V(/3) 

be the respective expressions for the probable values and the 
mean errors of a and |8 where of course 295(a) = land Si/- (/3) = 1. 
Now ^(a„) is the probability for the occurrence of the special 
value a^ of the variable values, in the same way \p(P^) is the 
probability for the occurrence of 13^. If a and /3 are independent 
of each other, then according to the multiplication theorem, 
<p{a^)4'{^u,) represents the probability for the simultaneous 
occurrence of a^ and ;8^ as well as the probability of the occurrence 
of the difference: a„ + /?w — e{a) — e(l3), since the probable 
values e{a) and e(/3) are constant quantities independent of 
either a or j3. 

If € denotes the mean error of a + /3 then it follows from the 
definition of e that e^ = SS[a + /3 - e{a) - e(j3)]V(a)'/'(/3) where 
the double summation is to take place for a'il possible values of 
the variables a and |3. 

The above expression may be written as: 
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e' = 2S[a - e{a) + /3 - e(^)]V(«)'/'W, 



or 

^2 ^ 



e = 22[a - e(a)]V(a)>/'(^) + 22S[a - e(a)][/3 

- em^iam^) + SS[/3 - e03)]V(/3a)i/'(^). 
A mere inspection will satisfy that the first and the last terms of 
this expression equals e^{a) and e^{fi) respectively. The first 
term may be written as follows : 

2[a - e(a)]V(a)2;/'(/3) = e\a) 
since 2i^(^) = 1. The same also holds true for the last term. 
With regard to the middle term we found before that 

e[a — e{a)] = 0. 

Hence it follows by mere inspection that this term becomes 0. 
Thus we finally have : 

e'ia + ^) = e'ia) + e\P) or e{a + /3) = Ve^^S) + e\a). 

Since the middle term is always 0, it follows a fortiori 

e(a - ft = ^t\a) + e\l3), 
also that 

e(ka) = kt{a), 

where ^ is a constant. This gives us the following theorems: 
The mean error of the sum or of the difference of two quantities 
is equal to the square root of the sum of the squares of each 
separate mean error. The mean error of any quantity multiplied 
by a constant is equal to this same constant multiplied by the 
mean error of the quantity. (See Appendix.) 

The above theorems may easily be extended to any number of 
variables: a, ^,y ■ ■ ■ so that in general we have 

6(a + i3 + 7---) = VeV) + e^(^) + 6^(t) + •■■• 

We shall later make use of this formula by a comparison of 
the different rates of mortality among difPerent population groups. 

So far we have computed the mean error for the absolute 
frequencies of a, and the quantity ys-pq was compared with the 
most probable number of successes s-p. But it may also be useful 
to know the mean error of the relative frequencies. This calcula- 
tion is performed by reducing the mean error of the absolute 



108 LAW OP LARGE NUMBERS. [67 

frequencies to the same degree as these absolute frequencies are 
reduced to relative frequencies. We saw before that e(a) = sp. 
The relative frequency of the probable value is e{a)/s = sp/s = p. 
The mean error of p therefore is 

The following remarks of Westergaard are worthy of note: 
'' \Mien a length is measured in meters and this measure may be 
effected with an uncertainty of say 2 meters, the length in 
centimetres is then simply found by multiplication by 100 and 
the uncertainty is 200 cm. When we wish to find the mean 
error of p instead of sp we only need to divide the mean error 
"Vspg by s, which gives ypqjs." 

The same result is also easily obtained from the formula 

e{ka) = ke{(x) 
when we let k = 1/s. 

67. Tchebycheff' s Theorem. — Tchebycheff's brochure ap- 
peared first in Liouville's Journal for 1866 under the title "Des 
valeurs Moyennes." A later demonstration was given by the 
Italian mathematician, Pizetti, in the annals of the University 
of Geneva for 1892. The nucleus in both Tchebycheff's and 
Pizetti's investigations is the expression for the mean error: 

e(^) = m - emMO. (1) 

The variable ^ may be of any form whatsoever, it may thus for 
instance be the sum of several variables: a, 0, y • • • while <p{^) 
is the ordinary probability function for the occurrence of ^. Let 
us denote the difference: ^r — e(^r) by Vrir = 1, 2, 3 • • ■ 5). We 
may then write the above expression for €(^) as: 

H^i) '4 + ^(&) '4 + ^(&) 4+--- ^(u 4 = '-^ (2) 

U/ U tl Ui CI 

where a is an arbitrarily chosen constant, but always larger than 
e(^) in absolute magnitude. If we, in the above equation, select 
all the v's which are larger than a in absolute magnitude together 
with their corresponding probabilities, (p{^) and denote all 
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such quantities by v', v", v'", ••• and ^{0', (p{0", <p{0"', ■ ■ ■ 
respectively, we have evidently: 

a^ ^ a^ ^ a^ '^ " ' ^ a' ^"^^ 

For any one of these different v's which is larger in absolute 
magnitude than a 

iP ^ 

a-' 

from which it follows a fortiori: 

^(9' + <PiO" + ■■■ = S^^(l) < '-^ - (3a) 

In this latter inequality, 2^*(^) is the total probability for the 
occxu-rence of a deviation from e(Q larger than a in absolute 
magnitude. 

Let now P^ be the probability that the absolute value of 
the mean error is riot larger than a; then 1 — P? is the total 
probability that the mean error is larger than a. We have thus 
from the inequality (3a) 

1 — Pt< — T- or Pt> 1 5- - (4) 

Let also a = X€(^). We then have by a mere substitution in the 
above inequality: 

Pr>l-jJ-2. (5) 

This constitutes the first of Tchebycheff's criterions which says: 
The probability that the absolute value of the difference \ a — e{a) \ 

does not exceed the mean error by a certain multiple, X, (X > 1) is 

greater than 1 — (1/X^). 

Now we made no restrictions as to the variable, f, which may 

be composed of the sum of several independent variables, a, /3, 

7, • • ■ . We saw before that 

e\a + P + y+ ■■■) = e^a) + e^O?) + 6^(7) + • • • 

Tchebycheff's criterion may therefore be extended as follows : 

The Tchebycheffian probability, Pt, that the difference | a + ;8 + 7 
_|_ ... _ g^Qj) _ g(^) _ g(-y) — ... I icill never exceed the mean 
error e by a certain multiple, X > 1, is greater than 1 — (1/X^). 
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68. The Theorems of Poisson and Bernoulli proved by the 
application of the Tchebycheffian Criterion. — Bernoulli in his 
researches limited himself to the solution of the problem in which 
the probabilities for the observed event remained constant during 
the total number of observations or trials. Poisson has treated 
the more general case, wherein the individual probability for the 
happening of the event in a single trial varies during the total s 
trials. This may probably best be illustrated by an urn schema. 
Suppose we have s urns Ui, U2, ■ • • Us with white and black 
balls in various numbers. Let the probability for drawing a 
white ball from the urns Ui, U2, ■ ■ ■ f7s in a single trial be 
Pi, P2, • ■ ■ Ps respectively, qi, q^, • • • qs the chances for drawing 
a black ball in a single trial. If a ball is drawn from each urn, 
what is the probability of a drawing a white and s — a black 
balls in s trials? It is easily seen that the Bernoullian Theorem 
is a special case when the contents of the 5 urns and the respective 
probabilities for drawing a white ball in a single trial are the 
same for all urns. 

69. Bernoullian Scheme. — We shall now show how the Tche- 
bycheffian critierions may be used in answering the question 
given above. First of all we shall start with the simpler case 
of the Bernoullian urn-schema. Here the probability for drawing 
a white or a black ball from each of the s urns in a single trial is 
p and q respectively. The square of the mean error in a single 
trial is pq. From the formulas in § 66 it then follows : 

e^ = «i^ + ^2^ + • • • = pq -\- pq -\- pq -\- • ■ • s times = spq 
or 

€ = ■Sspq. 

While the above expression gives us the mean error of the absolute 
frequency of the variable a, the relative frequency of a to the 
total number of trials, s, is given as 

^Ipq 

€ = —=-. 

We now ask: What is the total probability that the absolute 
deviation of the relative frequency a/s from its expected value 
sp/s = p never becomes larger than X times the mean error, 
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€ = -Spqjs? Letting X = ylsft and using the symbols Pt for 
this particular probabiUty, we have according to Tchebycheff's 
criterion : 

Pr > 1 - l/X^, or Pr> I- f/s. 

Since the mean error is equal to 'ypqjs we have: 

Xe = — . 

The answer to our question above follows now a fortiori as 
follows : 

The total probability that the absolute deviation of the relative 
frequency from the postulated a priori probability, p, never 
exceeds the quantity, 'ypq/t, is greater than 1 — (f/s) . 

By taking t large enough we may reduce '^Ipq/t (where pq is 
a fraction whose maximum value never can exceed 1 -f- 4,) below 
any previously assigned quantity, 5, however small. If, for 
instance, we choose the value .0001 for 8, we may rest assured 
that '^pq/t will be less than 5 when we take t larger than 5000. 
But no matter how large t is, so long as it remains a finite number, 
by letting s = <x as a limiting value, f/s will simultaneously 
approach as a limiting value. From the deductions thus 
derived we are now able to draw the following conclusions: 

1) By letting s = x as a limiting value, the probability, Pt, 
that the absolute difference between the relative frequency ajs and the 
postulated a priori probability, p, never becomes greater than ypq/t 
approaches 1 or certainty as a limit. 

2) By choosing the quantity, t, which is less than lim \s, suffi- 

«=00 

ciently great, we may bring ypqjt below any previously assigned 
quantity, 6, or make the difference between p and ajs as small as we 
please. 

From these conclusions we obtain a fortiori the following 

lim- = p. 

This constitutes the essential features of the Bernoullian Theorem. 
70. Poisson's Scheme. — Let pi denote the postulated prob- 
ability for success in the first trial, pi in the second, pz in the 
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third, etc., and let furthermore qi, (/■•, 53, • • • be the respective 
probabilities for the corresponding failures. If the trial (observa- 
tion) is repeated s times we obtain the following values for the 
probable or expected value of the frequency for successes e{a) 
and the mean error e 

c{a) = pi + pi + P3+ • • -ps = -Pi, 



£ = -^piqi + ^292 + pm + ■ ■ ■ Psqs = Vl>,(y, ((' = 1, 2, 3, • • -s) 

If by po and qo we denote the arithmetic mean or the average 
value of the s p's and s q's, such that 

Pi + P2 + P3+ ■■■Pa ,„, 

Po = ^: (3) 

gi + ?2 + g3 + ■■■ qa ... 

90 = (4) 

and assume that po and 50 denote the constant probabilities 
during each of the s trials (observations), we should according 
to the Bernoullian Theorem have : 

e{aB) = spo (5) 

«(«£) = ■^spoqo (6) 

where a^ stands for the absolute frequency in a Bernoullian 
series. 

An actual comparison of (1) and (5) and (3) shows that: 

e{ap) = eiae) (7) 

where aj> is the symbol for the absolute frequency in a Poisson 
series. In other words: If the s trials had been performed with 
constant probability for success equal to po instead of with 
varying probabilities pi, p2, ■ ■ ■ /)», the expected or probable 
value would be the same for the Bernoullian and Poisson scheme. 
With regard to the mean error we find, however, after a little 
calculation, 

e/(«) = e^a) - S(i;, - po)\ (8) 

The expression for the mean error in Poisson's Theorem is of 
the following form 



ep = -^piqi + p^qi + p^q^ + ■ • -piqi = VSpi^,- {i = 1, 2, 3- • •«). 
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Now piqi may be transformed as follows: 
Writing 

pi = Po + ipi — Po) 

ii = 9o — {pi — Po) 
and multiplying we obtain: 

Pt<li = Pm — {pi - Po)(po — go) - {Pi — Po)^ 

and summing up for aU values of ;' from i = 1 to i = s we have: 

ep- = *Pogo — -(p.- — Poy = tB- — Z{pi — Po)-. 

As {pi — Po)' alvN'ays is a positive quantity, it is readily seen that 
the mean error in a Poisson scheme is always less than the mean 
error in the corresponding BernouUian series. 
Writing e as follows: 

e = Vpigi + p2q2 + • ■ • + p^q^ 



= ^syj' 



Pl+ ■ ■ ■ +Pb Pl^+ ■■■ + Ps- 



and letting X = V^f, we have according to TchebychefF's The- 
orem the following rule: The probability Pr that the relative 
frequency remains inside the limits: 



pi + P2+ ■■■ Ps ^ :^J^\ 



Pi + Pi + ■ ■ ■ + Ps 



-w 



^ Pi + P2+ ■■■ + p. _ Pl^+P2^ + ■■• + ps 



is greater than 1 - (1/X-) or 1 - (f/s). 

By taking t sufficiently large and by letting s approach infinity 
as a limiting value the last term in the above difference, namely 
the average probabihty, po, and X times the mean error, becomes 
smaller than any previously assigned quantity, S, however small, 
while Pt at the same time will approach 1 as a limit. 

From this it now follows: 

When an infinite number of trials is made on an event, following 
the scheme of Poisson, then the expression: 

,. a pi + P2+ ■■■ + Pb 
hm— = = Po. 

.=a.S S 
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The essential part of Poisson's Theorem is contained in this 
equation. When p = pi = p2 = • • • ps 'we have a Bernoulhan 
series and obtain: 

Hm- = p, 

8=00 ^ 

which result we already derived above in a direct way. 

71. Relation between Empirical Frequency Ratios and Mathe- 
matical Probabilities. — In the above limit, a indicates the total 
number of lucky events while s is the total number of trials, the 
quotient a -f- s then is nothing more than the empirical prob- 
ability as defined in the preceding paragraphs. Both the 
Bernoullian and Poisson Theorems show that this empirical 
probability approaches the postulated a priori probability, p, 
(or the average probability po) as a limiting value. 

In this way we have succeeded in extending the theory of 
probability to other problems than the conventional kind involved 
in the games of chance or drawings of balls from urns. We do 
not need to limit our investigations to problems where we are 
able to determine a priori the probability for the happening of 
an event in a single trial, but limit ourselves to postulate the 
existence of such an a priori probability. 

A large number of trials or observations is made on a certain 
event E. This event is now observed to have occurred a times 
during the s total trials. To illustrate: An urn contains red 
and white balls, the total number of balls being unknown, a 
single ball is drawn and its color noted. This ball is replaced 
and the contents of the urn is mixed. A second drawing is 
made and the color of the drawn ball noted before the ball is put 
back in the urn. Let this process be repeated s times, where s 
is a large number, and furthermore let a be the number of red 
balls which appeared during the s trials. 

The quotient a -^ 5 we now call the empirical or a posteriori 
probability for the observed event, in this particular case the 
a posteriori probability for the drawing of a red ball. When 
5 = 00 the Bernoullian Theorem tells us that the empirical 
probability foUnd in this manner and the postulated a priori 
probability whose numerical value, however, was unknown 
before the drawings took place, are identical as far as numerical 



t . 
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magnitude i> concerned. As we already ohser^-ed in the intro- 
ducTorj' remarks to this chapter it is impossible to perform a 
certain experiment an intinite mmiber of times, and it is therefore 
out of the qiiesrion to determine the limiting and ideal value of 
the iH^sieriori probability, and we must satisfy ourselves ^"ith an 
approximation by performing a finite nimiber of trials, or let ^« 
be a finite mmiber. The quotient a -^ .* is then the empirical 
approximate a posteriori probability. We know also that al- 
though this quotient is an approximation of the posTidated a 
priori probability only, that by increasing .< or what amounts 
to the Siime thing, by making a large niumber of trials, the dif- 
ference between the approximate empirical probability ratio, 
Q -i- *, and the a priori probability, p, becomes smaller as the 
niunber of trials is increased. Bvit how small is the difference? 
Or how many times shall we repeat the trials (observations so 
that, for practical piu-poses, we may disregard this difference? 
It does not suffice to be satistieii with the fact that the difference 
becomes proportionately smaller the greater we make the number 
of trials and merely insist that in order to avoid large errors it is 
only necessary to operate with very large nimibers. Immediately 
the question arises: Wnat constitutes a large number? Is 100 
a large ntmiber. or is 1.000. 10.000, 100.000 or even a milli on an 
answer to tliis question? As long as this question remains 
unanswereii. it helps but Uttle to poke upon the "law of large 
munbers." a tendency which unfortunately is too manifest in 
many statistical researches by amateiu- statisticians. As long 
as a definition, much less than a nimierieal determination of the 
ranee of "small mmibers"' is lacking, little stress ought to be 
laid on such remarks based in the metaphorical terms of "small" 
and "large" numbers. 

72. Application of the TchebycheflSan Criterion. — It is readily 
seen that even a rough qr.antitive determination of the difference 
between the approximate a posteriori probabUity and the 
postulated a priori probability based upon the mere vague state- 
ment of "large numbers' is utterly impossible, and it remains " 
to be seen, therefore, if the theory" of probability' offers tis a 
criterion that might serve as a preliminarj' test for the above 
diiierence. To restate oxir problem: If p is the p rulaini a priori 
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probability and a -i- s is the empirical probability {a posteriori) or 
relative frequency of the event, E, what is the probability that the 
difference, \ (ajs) — p | does not exceed a previously assigned quantity^ 
In the mean error and the associated theorem of Tchebycheff 
we have a simple and easily applied criterion to test this prob- 
ability. 

Tchebycheff's rule states that the probability, Pj-, of a devia- 
tion of a variable from its probable value, not larger than X 
times its mean error, is greater than 1 — (1/X^). 
For 

X=3 Pt> I- ^ = 0.888 

X = 4 Pt> I - j\ = 0.937 

X = 5 Pr > 1 - 2T = 0-96. 

This shows that a deviation from the expected or probable 
value of the variable equal to 4 or 5 times the mean error possesses 
a very small probability and such deviations are extremely rare. 

Let us for example assume that the observed rate of mortality 
in a certain population group is equal to .0200. Let furthermore 
the number exposed to risk equal 10,000. The mean error is 

(02X 98\^ 
^ ) ~ -0014. If the number of lives exposed to risk 

was one million instead of 10,000, the mean error would be 

(02 X 98\^ 
' ^ — J = .00014. A deviation four times this latter quantity 

is equal to .00056, and according to Tchebycheff's criterion the 
probability for the non-occurrence of a deviation above .00056 
is greater than .937, or the probability of dying inside a year will 
not be higher than .0206 or less than .0194. For an observation 
series of 4,000,000 homogeneous elements we might by a similar 
procedure expect to find a rate of mortality between 0.02 + 
0.00028 or 0.02 - 0.00028. Thus we notice that the mean error 
of the relative frequency numbers decreases as the number of 
observations increases. 



CHAPTER X. 

THE THEORY OF DISPERSION AND THE CRITERIA OP LEXIS 
AND CHARLIER. 

73. Bemoullian, Poisson and Lexis Series. — In the previous 
chapter we limited our discussion to single sets consisting of 
s individual trials and found in the mean error and the criterion 
of Tchebycheff a measure for the uncertainty with which the 
relative frequency ratio a/s as well as the absolute frequency 
a were affected. How will matters now turn out if, instead of a 
single set, we make N sets of trials? As already mentioned in 
paragraph 54, in general in N such sets we shall obtain N dif- 
ferent values of a, denoting the absolute frequency of the event 
represented by the sequence 

di, ai, az, ■ ■ ■ a^. 

Our object is now to investigate whether the distribution of 
the above values of a around a certain norm is subject to some 
simple mathematical law and if possible to find a measure for 
such distributions. 

In this connection it is of great importance whether the pos- 
tulated a priori probabilities remain constant or not during the 
'N sample sets. Three cases are of special importance to us.^ 

1. The probability of the happening of the event remains 
constant during all the iV sets. The series as given by the ab- 
solute frequencies in each set is known as a Bemoullian Series. 

2. The same probability varies from trial to trial inside each 
of N sample sets, the variations being the same from set to set. 
The series as given by the absolute frequencies is in this case 
known as a Poisson Series. 

3. The probability remains constant in any one particular set 
but varies from set to set. The absolute frequency series as 
produced in this way is called a Lexis Series. 

The above definition of these three series may, perhaps, be 
made clearer by a concrete urn scheme. 

' The terminology in due to Charlier. 
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A. BernouUian Series. — s balls are drawn one at a time from an 
urn, containing black and white balls in constant proportion during 
all drawings. Such drawings constitute a sample set. Let us in 
this particular set have obtained say ai white and /3i black balls, 
where ai + (3i = s. We make N sets of drawings under the 
same conditions, keeping a record of white balls drawn in each 
set. The number sequence thus obtained. 

Oil, CLi, as, • • • a^. 

is a BernouUian Series. 

B. Poisson Series. — s individual urns contain white and 
black balls, the proportion of white to black varying from urn 
to urn. A single ball is drawn from each urn and its color noted. 
In this way we get ai white and j8i black balls constituting a set. 
The balls thus drawn are replaced in their respective urns and a 
second set of s drawings is performed as before, resulting in aj 
white and ^2 black balls. The number sequence, 

ai, a2, as, ■ ■ ■ a^f, 

of white balls in N sets represents a Poisson Series. 

C. Lexis Series. — s balls are drawn one at a time under the 
same conditions as set No. 1 in the BernouUian series. The a\ 
white and /3i black thus drawn constitute the first set. In the 
second and following set the composition of the urn is changed 
from set to set. The number sequence representing the number 
of white balls in the N respective sets: 

Oil, Oil, OLz, ■ ■ ■ Oijf 

is a Lexian Series. The scheme of drawings is the same as in 
the BernouUian Series except that the proportion of white to 
black balls varies from set to set. 

74. The Mean and Dispersion. — Since we have no a priori 
reasons for choosing any one particular value of the various a's 
of the above sequences in preference to any other, we might give 
equal weight to each set and take the arithmetic mean as defined 
by the formula: 

of the N values of a.. 
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I-t will be unnecessary to enter into a detailed discussion of 
the mean, which is a quantity used on numerous occasions in 
every day life. We shall, however, define another important 
function known as the dispersion (standard deviation). The 
dispersion is denoted by the Greek letter, a, and is defined by 
the formula 

„ (ai - MY +ia2- My+ ■■■ (a^ - M)^ 



N 



(11) 



We shall now attempt to find the expected value of the mean 
and the dispersion in the three series. First of all take the 
BernouUian Series. Let the constant probability for success in 
a single trial be po. We have then for the various expected values 
or mathematical expectations of a: 

Set No. 1: e(ai) = spo 

Set No. 2: e{ai) = spo 



Set Noo N: e{aif) = spo 

or: 

e(ai) + e(a2) + • • • + eia^) Se(a^) Nspo 



spo, 



N N N 

which shows that the mean in a BernouUian Series of N sample 
sets is equal to the expected value of the absolute frequency in 
a single set. 

In regard to the dispersion we have for the various sets: 

Set No. 1: e{ai - M^ = e^(ai) = spoQo 

Set No. 2: e{<X2 - Mf = t^a^) = sp^qo 



Set No. N: e(a^ - Mf = e^aj^) = spoqo 

Summing up and forming the mean we obtain for the expected 
value of the dispersion in a BernouUian Series, which we shall 
denote by the symbol o-^: 



2 



S€^(q;^) Nspoqo 



<^B = ]y^ ^ ^ = Spoqo- 



120 THE THEORY OF DISPERSION. [73 

This result shows that the dispersion in a Bernoullian Series is 
equal to the mean error, e, in a single set. 

We now proceed to the Poisson Series. Let pi be the mathe- 
matical probability of the happening of the event in the first 
trial, j)2 be the probability in the second trial and so on for all 
trials, and let us furthermore denote the means of the p's and 
q's by: 

Pi + P2 + ps ■•• + p» 



9o = 



s 
qi + q2 + qs ■■• + Qs 



s 

Applying a similar analysis as above we have: 

Set No. 1 : e(ai) = pi + p2+ ■ ■ ■ + Ps = spo 
Set No. 2: e(a2) = pi+ P2+ ■ ■ ■ + Ps = spo 

Set No. N: e{aif) = pi + ^2 + ■ ■ ■ -\- p^^ = spo 

The actual summation of the above values of e{oi) gives us the 
following value of the mean in a Poisson Series: 

Mp = spo. 

Let us for a moment assume that all the drawings had been 
performed with a constant probability, po. According to the 
Bernoullian scheme we should then have: 

Ms = spo. 

An actual comparison shows that M^ = Mp. This shows that 
the same mean result is obtained if we draw s balls from the urns 
Ui, U2, • ■ ■ Us with their corresponding probabilities pi, p2, • ■ ■ p, 
for drawing a white ball, as would be obtained if we drew all the s 
balls from a single urn where the composition is such that the 
ratio of the number of white to that of black balls is as po : Q'o, 
where po and go are defined as above. 

Let us now see how matters turn out in regard to the dispersion. 
We have for the N sets: 

Set No. 1: e{ai - Mf = piqi + p^q^ + • • • = Sp,g„ = €^(0,^) 
Set No. 2: e(a2 - Mf = p^qx + p^q^ + • • • = Sp,g, = t^a^) 

Set No. N: e{at^ - My = piqi + ^2^2 + ■ • • = 2p,g, = e\a,^) 
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In § 70 we showed, however, that "Sp^Qy could be expressed 
as follows: 

€p^{a) = spoqo - 2(p, - poV = e/ia) - Xip^ - po)\ 

A simple straightforward calculation gives us now for the 
dispersion, a-p, 

(7p2 = 0-^ - S(p^ - poY, 

In the corresponding Bernoullian Series with constant proba- 
bility, po, the dispersion is equal to sp^qo, which shows that the 
dispersion in a Poisson Series is less than the corresponding 
dispersion of the Bernoullian Series. 

We finally come to the mean and the dispersion in the Lexian 
Series which we shall denote by ikf^ and ff^ respectively. Let us 
furthermore define the two quantities po and go as follows : 

Pi + Pi+ \-Pn 

Po= ^ , 

gi + g2 + h gjv 

90= ^ 

A computation along similar lines as above gives us first for 
the mean, ilf^: 

Set No. 1 : e(ai) = spi 

Set No. 2: 6(0:2) = sp^ 

Set No. A'': e(aj^) = spj, 

Thus we have : 

Se(a,) Ssp, s[pi + P2 + ■ ■ ■ pA 
M, = -^^ = ^^ = ^ = ^Po. 

For the dispersion we have the following expectations: 

Set No. 1 : eispo - ai)^ 

Set No. 2: e{spo - a^Y 

Set No. N: e{spo - aj^Y 

The expected value in the fth set is 

e{spo — a^y = S(5po — a^)Vr(a), 
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where (pXa) is the general term in the probability binomial: 
iVv + q^y =1- An analysis along similar lines as in § 65 
gives us now: 

eisfo — a^y = s^pa^ — 2s^pop, + ^p^ + sp.q^ 

= sp,q, + s^ip^ - pof 

as the expected value of the square of the difference between the 
mean and the absolute frequency in the j/th set. For all N 
sets we then have 

o-i- = —]^ + ]^^iP^- Vof- 

We have, however, the following identity : 

Sp„g, = Npaqo - S(p, - po)^ 



and hence 



2_ "•«'-* 



+ -^^S(p.-po)^ 



74a. Mean or Average Deviation. — Of quite another character 
than the standard deviation or dispersion is the so-called mean 
or average deviation, t>, defined by means of the following 
relation: 

|ai-ilf! + |«2--M"| + |a3--M"|+ V\aK- M\ 

^= N ' 

where | a^ — ilf | means the absolute difference between m, and 
M. We shall now proceed to determine the expected value of 
i} on the assumption that the observed data follow the Bernoullian 
Law. The mean in a Bernoullian series with constant prob- 
ability po we found before to be equal to spo which was the 
expected value of a in a single sample set of s trials. The 
expected value of the absolute difference in the yth set is therefore: 

e\a^ — spo\ = "Zilay — spo] <pv(a), 

where as usual <Pria) is the binomial probability function. 

The deviations from spo are partly positive and partly negative. 
We proved, however, before that 

e(a„ — spo) = S(a^ — spo)<pXa) = 0. 
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Hence it is readily seen that the algebraic sum of the positive 
deviations cancel the algebraic sum of the corresponding nega- 
tive deviations so that e\a^ — spo\ equals twice the sum of 
the positive deviations. Positive deviations occur for values 
of a greater than spo, i. e., for all values which a may assume 
from s to 5^0 in the binomial expansion: (po + 9o)'.' Hence we 
have (omitting subscripts) : 



p«g8-a 



e I a — 5p I = 2 ^ (a — 5p) ( ) 

The second of these sums represents the following function of 
p and q 

fip, q) = P'+ ([) P'-'q + (2 ) ?*"¥ + ■ ■ " + (/g) P'"^"- 

By partial differentiation in respect to p and by following 
multiplication by p we have: 

df 



p£=sp'+is-l)(\) p'-^q+ (s- 2) ( 2 ) P-V+ 



+ 



sql ^ ^ 



p'g'""- 



Hence we may write: 

PQ^-^Pf 

Furthermore f{p, q) is a homogenous function in respect to p 
and q of the sth order. We may then apply the following well 
known Eulerian Theorem from the differential calculus: If 
f{p, q) is homogenous and has continuous first partial derivatives 
then 

, df , df 

'f=^dp+'^dq 

Using this relation we may write : 

.[„-,pI = 2{,|-.,/}=2M{|-|f 

' Spo is taken to the nearest integer. 
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The partial derivatives of /(p, q) with respect to p and q are 
of the form: 

dp 'P ^*^* ^^^ + ^ 1-2-3--- (5?-l) ^ ^ 

g(^ - 1) • • • sp 
^ l-2-3---sq ^ ^ 

a? ^ ^*^* ^^ ^ ^ 1-2-3 ••• (*g- 1) ^ ^ 

Hence we have: 

df df sis - 1) ■■■ sp , r s 

dq dq l-2-3---sq ^ ^ [splsql^^' 

We proved, however, in § 63 that the expression inside the 
bracket may be written approximately as follows: 

T = ' 



'\l2-n-spq 
This gives us finally (again using the subscripts) : 



e\a^ — spo\ = 2spoqoTm = \j — " 



2spoqo 



as the expected value of the absolute deviation in the vth sample 
set. This same relation evidently holds true for any other of 
the N sample sets, which finally gives us the following result for ??: 



12 , 

\lspoqo. 



The dispersion in a Bernoullian series we found before to be 
of the form: 

<Ts = -^spoqo. 

Hence we have the following relation between the dispersion 
and the mean deviation: 



-4'= 



75. The Lezian Ratio and the Charlier Coeflacient of Dis- 
turbancy. — The results given in the last few paragraphs may be 
embodied under the following captions. 
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1. The mean in a Poisson and Lexis Series is the same as the 
mean in a Bernoullian Series with constant probability of po 
in a single trial, where po is defined as above. 

2. The dispersion in a Poisson Series is less than the corresponds 
ing dispersion in a Bernoullian Series. 

3. The dispersion in a Lexis Series is greater than the dispersion 
in a Bernoullian Series. 

The mean and the dispersion of the Bernoullian Series occupy 
in this connection a central position and may be used as a standard 
of comparison with other series. This is the method adopted by 
Lexis in investigating certain statistical series, and we shall re- 
turn to it In the following chapter. Lexis determines first 
in a direct manner the dispersion as defined by formula (11) 
from the statistical data as given by the number sequence a. 
This process is known as the direct process (by Lexis called a 
physical process) and gives a certain dispersion, tr. After this 
the dispersion is computed by an indirect (combinatorial) 
process under the assumption that the series follows the Ber- 
noullian distribution. The ratio, a : ffg, which Charlier calls 
the Lexian Ratio and denotes by the symbol, L, may now give 
us an idea about the real nature of the statistical series as 
represented by the number sequence. 

When i = 1, the series is by Lexis called a normal series. 

When i > 1, the series is called hypernormal. 

When L < 1, the series is a subnormal series. 

It is easily seen from the respective formulas that the Poisson 
Series are subnormal series whereas the Lexian Series are hyper- 
normal. The great majority of statistical series are — as we 
shall have occasion to see in the following chapter — of a hyper- 
normal kind and correspond thus to the Lexian Series. 

In § 74 we found the dispersion in the Lexis series as 

a-L = <^B + if - s)<Tp^, 
where 

S(p. - p,? 



(Tv = 



N 



The quantity, ap, is the natural measure of the variations in 
the chances from the mean or normal probability, po. It is 
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however, dependent on the absolute values of these chances, so 
that if all chances are changed in the same proportion, <Tp is 
also changed in the same proportion. Another drawback which 
influences the Lexian Ratio is the variations of the number s 
in each sample set. In order to overcome this difficulty Charlier 
divides the above quantity Cp by po- Assuming that the vari- 
ations in the individual probabilities within each set are of no 
perceptible influence on the dispersion, we have from the Lexian 
dispersion: 



y' - <Ti 



Neglecting s in comparison with s^ and remembering that 
Mg = spo, we have as an approximation: 



Tp V^ 



Po Mb 

Charlier calls the quantity lOOp the coefficient of disturbancy of 
the statistical series. It is readily seen that the Charlier coef- 
ficient is zero in normal series. For hypernormal series it is a 
positive real quantity whereas for subnormal series p is imaginary. 



CHAPTER XI. 

APPLICATION TO GAMES OF CHANCE AND STATISTICAL 
PROBLEMS. 

76. Correlate between Theory and Practice. — In the theo- 
retical analysis just completed we treated the fundamental ele- 
mentary functions in the theory of probabilities, the probability 
function, the expected or probable value of a variable quantity, 
the mean error, the dispersion and the coefficient of disturbancy. 
The formulas thus derived were founded upon certain hypo- 
thetical axioms, which formed the basis of a mathematical a 
priori probability as defined by Laplace. As far as the purely 
abstract mathematical analysis is concerned it matters but little 
if the hypotheses are physically true or not, that is to say, if 
they agree with physical facts in the universe as it is known to 
us. A mathematical analysis may be made on the basis of 
widely divergent hypotheses, a fact which is clearly shown in 
the Euclidean and Non-Euclidean geometries. It is, however, 
quite a different matter when we wish to apply our theory to 
actual phenomena (physical observed events) as it is evident 
that a correlation between hypothesis and actual facts follows 
by no means a priori. It is, of course, true that the different 
hypotheses in the theory of probabilities are derived to greater 
or less extent from outside sense data. Such sense data, however, 
give us only the effect and no clue whatsoever to the relation 
between cause and effect. In the application of our theory every 
hypothesis — or rather the results derived from such hypothesis 
— must be verified by actual experience. Before such a veri- 
fication is made, we advise the reader to be sceptical and not 
trust too much in the authority of others but follow the sound 
advice of Chrystal: "In mathematics let no man over-persuade 
you. Another man's authority is not your reason." We can so 
much more encourage an attitude of scepticism in view of the 
fact that even among the leading mathematicians of the present 
time there exists no uniform opinion as to the truth of the 
axioms underlying the theory of probabilities. 

127 
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77. Homograde and Heterograde Series. Technical Terms. 

— Whenever a common characteristic or attribute of several 
groups of observed individual objects or events allows a purely 
quantitative determination, it may be made the subject of a 
mathematical analysis and in such cases we are often able to 
make excellent use of the theory of probabilities. Such quan- 
titative measurements may be divided into various domains 
of classification. Traces of such classification are found in almost 
every treatise on mathematical statistics but a uniform system 
nomenclature is unfortunately lacking among the various 
statisticians and any one reading the modern literature on mathe- 
matical statistics notices often various inconsistencies of the 
different authors. Mr. G. Udny Yule in his excellent treatise 
"Theory of Statistics" classifies the statistical series into "sta- 
tistics of attributes" and "statistics of variables." Apart from 
the fact that Mr. Yule's statistics of variables also is a statistics 
of attributes — although of different grades — the author appar- 
ently ignores the criterion of Lexis and the associated criterion 
of Charlier. The German writers use the terms "stetige und 
unstetige KoUektivgegenstand " (continuous and discontinuous 
collective objects), which were originally introduced by Fechner. 
Other writers, such as Johannsen of Denmark and Davenport 
of America, use still other terms. After having made a com- 
parison of the various systems of classification I have in the 
following decided to adhere to the system of Charlier wherein 
the observed statistical series are classified as homograde and 
heterograde. 

If the individuals all possess the same character or attribute 
in the same grade (intensity) — or if we disregard the different 
grades of the attributes — such individuals are called homograde, 
and the statistical series thus formed is a homograde series. If 
on the other hand we take into consideration the different 
varying grades of the attributes observed or measured and form 
the series accordingly we obtain a heterograde series. As examples 
of homograde series we may mention the observed recorded 
series of coin tossing, card drawings in reference to a specified 
event, number of births or deaths in a population group, etc. 
A coin when tossed will either show head or tail, a person will 
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either be dead or alive. There are no intermediate degrees as 
for instance that of a half dead person. In all such series the 
dividing line between the occurrence of the event (attribute) E 
and the occurrence of the opposite event E is distinct and suggests 
itself a priori and there is no doubt as to the classification of the 
observed event. 

The original record of observation of a homograde series — also 
known as the primary list — is simply a record of the presence or 
non-presence of a specified attribute of the individuals belonging 
to the group under observation and is of the following form: 





Primakt List 


OF Homograde iNDiviDUAiiS. 








Attribute. 




Symbol for the IndiTidaal 


Present {E). 




Non-present {M). 


h 




1 






h 








1 


h 








1 


li 




1 






h 




1 







In this scheme the individuals /i, li and Js possess the attribute 
E while the individuals h and I3 do not have this attribute. 

In observing the presence of a specified attribute in a group of 
individual objects we meet, however, frequently series of quite 
another nature than the simple homograde series. When in- 
vestigating the different measures of heights of persons inside a 
certain population group no simple dichotomous (i. e., cutting 
in two) di\'ision in two opposite and mutually exclusive groups 
suggests itself a priori. It is of course true that we might divide 
the total population under observation into two subsidiary groups 
of tall individuals and short individuals. But the question then 
immediately arises, What constitutes a short or a tall person? 
The answer must necessarily be arbitrary. Persons above the 
height of 170 cm. may be classed as tall while persons falling 
short of such measure may be classed as short persons, and we 
might in this way form a primary homograde table of the form 
as given above. There is no logical reason, however, to choose 
the quantity 170 cm. as the dividing line and comparatively 

10 
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little value would result from such a classification. It is evident 
that all persons belonging to groups of tails or shorts are not iden- 
tical as to the particular attribute in question. The height is 
merely a characteristic which varies with each individual and no 
two individuals have mathematically speaking the same height. 
If we take into consideration the different grades of height among 
the individuals and arrange the primary table accordingly we 
obtain a heterograde series of observations. The general form 
of the primary table of such series is: 

Pkimakt List of Hetebogeade Inbividttals. 
Symbol for the Individual. Grade of Attribute, 

/i Si 

Is X3 

II Xi 

Is Xi 



Here the quantities Xi, Xi, • • • Xn give the measures (in kilo- 
gram, liter, meter, etc.) of the characteristic in question.^ 

As examples of heterograde series we may mention the lengths, 
volumes or weights of animals, plants or inorganic objects; 
astronomical observations as to the brightness of celestial objects; 
meteorological records of rainfall, temperature or barometer 
heights ; the frequency of deaths among policyholders as to 
attained age in an assurance company; duration of sickness or 
disablement, etc. 

The investigation of heterograde series is a problem of which 
we shall treat later under the theory of errors or frequency curves. 
The homograde series may, however, be explained fully by means 
of the Bernoullian, Poisson and Lexian Series as founded on the 
mathematical theory of probabilities in the previous chapters. 

78. Computation of the Mean and the Dispersion in Practice. 
— It would be superfluous to enter into a detailed demonstration 
of the practical calculation of either the mean or the dispersion 

1 It is to be noted that in the homograde series the primary list is given by 
abstract numbers while the heterograde series consists of concrete numbers. 
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were it not for the fact that this calculation is performed with a 
lot of unnecessary and useless labor by the untrained student and 
even by many professional statisticians. By the ordinary school 
method the number zero is chosen as the starting point and all 
the variables are expressed in their absolute magnitudes, i. e., 
their distance from 0. In this way one often encounters mul- 
tiplication and addition of large numbers. The Danish biologist 
and statistician, W. Johannsen, has illustrated the futility of this 
method in the following example taken from his treatise " Forelses- 
ninger over Laeren om Arvelighed" (Copenhagen, 1905).^ Dr. 
Petersen, the director of the Danish Biological Station, counted 
the tail fin rays of 703 flounders (Pleuronectes) caught around the 
neighborhood of the Skaw. The observations follow : 

Number of rays: 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 

No. of flounders: 5 2 13 23 58 96 134 127 111 74 37 16 4 2 1 

The ordinary way of computing the mean would be as follows : 

[5 X 47 + 2 X 48 + 13 X 49 + h 1 X 61] ^ 703, 

where 703 is the total number of individuals under observation. 
In Chapter X we gave the following formula for the mean: 

Hf _ '^1 + "^^ + ™3 + • • • + ^jy ,.^ 

This formula may evidently be written as follows: 

mi - Mo + mi - Mq + ms - Mq + 1- my - Jlf p 

^ - N 

(2) 
, ,, 2(m^ — Mo) , ,, 7 , ,, 
+ Mo= ^ + Mo=b + Mo. 

In this expression Mo, which Charlier calls the provisional mean, 
is an arbitrarily chosen number. To show how the introduction 
of this quantity actually shortens the calculation of the mean 
we return to the above quoted series of observations of tail fin 
rays of flounders. 

1 German edition "Elemente der exakten Erblichkeitslehre" (Jena, 1913), 
page 11. 
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Number of Rays (x) in 703 Flotindeks According to Observations of 

Dr. Petersen. 

N = SF(x) = 703, Mo = 53. 





Frequency 










X. 


= J'(x) 


X — 


Mo. 


(X - Mo)F(,x). 


47 


5 


-6 




- 30 




48 


2 


-5 




- 10 




49 


13 


-4 




- 52 




50 


23 


-3 




- 69 




51 


58 


-2 




-116 




52 


96 


-1 




- 96 




53 


134 




+0 




+ 


54 


127 




+1 




+ 127 


55 


111 




+2 




+222 


56 


74 




+3 




+222 


57 


37 




+4 




+148 


58 


16 




+5 




+ 80 


59 


4 




+6 




+ 24 


60 


2 




+7 




+ 14 


61 


1 




+8 




+ 8 


Sum = S 


703 






-373 


+845 


We have now: 













b = (845 - 373) -f- 703 = 0.67, M = Mo + b = 53.67. 

The method is quite simple and needs hardly any explanation. 
From a cursory examination of the material we notice that the 
mean is situated in the neighborhood of the series consisting of 
53 rays. We choose therefore the provisional mean. Mo, as 53. 
We next form the algebraic differences of a; — Mo. These dif- 
ferences are then multiplied by i^(.T). The algebraic sum of 
these products divided by iV = "EFix) gives us the value of b, 
which quantity added to Mo gives the value of the mean, M. 

To show a slightly modified form of the method we take the 
following observations of coal-mine accidents in Belgium, covering 
the period 1901-1910, from "Annales des Mines de Belgique." 
These data I have reduced to a stationary population group of 
140,000 mine workers. In other words the quantity s as defined 
in § 83 is equal to 140,000. 
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Number {m) op Persons Bjlled in Coai, Mine Accidents in Belgium, 

1901-1910. 
s = 140,000, N = 10, Mo = 140. 



Tear. 


m. 


m — Mo, 


(m - Mi,)'. 


1901 


164 


+24 


576 


1902 


150 


+10 


100 


1903 


160 


+20 


400 


1904 


130 


-10 


100 


1905 


127 


-13 


169 


1906 


133 


- 7 


49 


1907 


144 


+ 4 


16 


1908 


150 


+ 10 


100 


1909 


133 


- 7 


49 


1910 


133 


- 7 


49 


Sum = S 




-44 +68 


1608 



6 = (68 - 44) + 10 = 2.4, M = 140 + 2.4 = 142.4. 

In this example probably it would have been easier to have formed 
the sum 2m„ directly and then obtained the mean by division 
by 10. The actual formation of the algebraic sums of rriy — Mq 
however, greatly facilitates the calculation of the dispersion, a, 
to which we now shall turn oiu- attention. 
The formula for the dispersion 

^2=2(m^^\,= 1^2,3, ...iV) (3) 

may evidently be written as follows: 
, (mx - MoY + (m2 - MoY + ■■■ + {m^- M^f 



-62 



N 

-6' 



2(m, - Mor 



(4) 



X 



where b as usual means M — Mo, Mo being the provisional mean. 

For Belgian coal mine accidents we thus obtain from the above 

data: 

0-2 = (1608 +- 10) - 5.76 = 155.04. 

Where the number of observed individuals is very large an 
arrangement as that given above for the Belgian statistics becomes 
too bulky and it is therefore customary to group the observations 
in classes as for instance in the example of Dr. Johannsen. The 
dispersion is then computed according to the following elegant 
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[78 



method due to Charlier from whose brochure "Grunddragen af 
den matematiska Statistiken" ("Rudiments of Mathematical 
Statistics") I take the following example: 

Ntjmbeb op Boys (m) per 500 Children Bobn in 24 Provinces op Sweden 
DTJRiNG Each Month in 1883 and 1890. 





s = 


500, 


N = 576, 


Mo = 257, 


w = 5. 




( 


Class. 




Frequency 
= nx). 


xFXx). 


xfF(,x). 




Limits 

m. 


Number. 

= X. 


{x+l)'F[x). 


200-204 


— 


11 


1 


- 11 


+ 121 


100 


205-209 


— 


10 














210-214 


— 


9 














215-219 


— 


8 


1 


- 8 


+ 64 


49 


220-224 


— 


7 


2 


- 14 


+ 98 


72 


225-229 


— 


6 


5 


- 30 


+ 180 


125 


231-234 


— 


5 


13 


- 65 


+ 325 


208 


235-239 


— 


4 


18 


- 72 


+ 288 


162 


240-244 


— 


3 


47 


-141 


+ 423 


188 


245-249 


— 


2 


60 


-120 


+ 240 


60 


250-254 


— 


1 


81 


- 81 


+ 81 





255-259 







108 








108 


260-264 


+ 


1 


91 


+ 91 


+ 91 


364 


265-269 


+ 


2 


60 


+ 120 


+ 240 


540 


270-274 


+ 3 


44 


+ 132 


+ 396 


704 


275-279 


+ 4 


22 


+ 88 


+ 352 


550 


280-284 


+ 


5 


16 


+ 80 


+ 400 


576 


285-289 


+ 


6 


6 


+ 36 


+ 216 


294 


290-294 


+ 


7 














295-299 


+ 


8 














300-304 


+ 


9 


1 


+ 9 


+ 81 


100 



Suin = S 



576 



+ 14 



+3596 



+4200 



The class width interval in the above scheme was chosen as 5. 
The observed frequencies are given in column 3. We thus find 
that the greatest frequency of 108 falls in the class interval 
255-259. Choosing this class interval as the origin we designate 
the other class intervals with their proper positive and negative 
numbers as shown in column 2. The provisional mean, Mo, 
is taken as the center of class 0, or Mo = 257. In this way the 
class interval w; = 5 is taken as the unit. 

The whole calculation is very simple. We first of all form the 
product X X F{x). The sum of these products divided with 
576 = N gives the distance — b — from the provisional mean to 
the arithmetic mean, expressed in units of the class interval, w. 
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We have thus: 

6 = w X 14 -^ 576 = + 0.0243m) = + 0.122, 
or 

M = 257 + 6 = 257.12. 

The formula for the dispersion takes the form 

" '^V~N — ^ J' 

where h is expressed in units of the class interval. The table gives 

us 

1:F{x)x^ = 3596 or 

(t2 = w2[3596 -^ 576 - (0.024)^] = w26.242, 

0- = w X 2.498 = 12.49. 

Charlier now checks the results by means of the following relation: 

2(.r + lfF{x) = 'Zx'Fix) + 2Si^(x) + ^F{x). 

For the above example we have: 

'2,x^F{x) = + 3,596 

21,xF{x) = + 28 

Sf (x) = + 576 



Sum = + 4,200 = S(a: + iyF{x), 

which proves the accuracy of the calculation. 

The full elegance of the Charlier self checking scheme is shown 
at a later stage under the calculation of the parameters of fre- 
quency curves. In the meantime the student may test the ad- 
vantage of the provisional mean by trying to compute the mean 
and the dispersion by the conventional school method. A 
direct computation by this method would in the last example 
take about a whole day's labor. 

Before we proceed to apply the formulas previously demon- 
strated, we wish to call the attention of the reader to the following 
important properties of the mean and the dispersion: 

1. The algebraic sum of the deviations from the mean — i. e., 
Iiim^ — M) — is zero. This follows immediately from formula 
(2) of §78. We have: 

M = ^ + Ma = h-\- Mo, 
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where Mo, the provisional mean, is an arbitrarily chosen number 
and 6 = S(m^ — Mo) -i- N. li Mq = M we have evidently 
& = 0, which proves the statement. 

2. The dispersion (standard deviation) is the least possible 
root-mean-square deviation, i. e., the root-mean-square deviation 
is a minimum, when the deviations are measured from the mean. 
We have (see formula (4)): 

, S(m. - MY Z(m. - Mpf 
" = N = N ^ ' 

from which the proposition follows a fortiori. 

79. Westergaard's Experiments. — The Danish statistician, 

Harald Westergaard, in his " Statistikens Teori i Grundrids" 

gives the following results of 10,000 observations divided into 

100 equal sample sets of drawings of balls from a bag containing 

an equal number of red and white balls (the ball was returned 

to the bag after each drawing): 

White: 33 34 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 

Frequency. 01 1222334565 11 95 10 48 

Wliite: 55 5o 57 58 59 60 61 62 63 

Frequency: 354400111. 

The elements as resulting from Westergaard's drawings clearly 
represent a BernouUian Series where the number of comparison 
s is equal to 100. Arranging the data in classes — taking 3 as 
the class interval — the computation of the mean and the dis- 
persion is easily performed by means of the Charlier self checking 
scheme. 
Bernouluan Sebies. Number or White Balls in 100 Drawings 







(Westergaard). 








s = 100, 


N = 100, 


Mo = 49, 


u) = 3. 




m. 


X. 


i?(i). 


xF(x). 


il^F(x-). 


(x+l)2i?(j;). 


33-35 


-5 


1 


- 5 


25 


16 


36-38 


-4 














39-41 


-3 


6 


-15 


45 


20 


42-44 


-2 


8 


-16 


32 


8 


45-47 


-1 


15 


-15 


15 





48-50 





25 








25 


51-53 


+1 


19 


+ 19 


19 


76 


54-56 


+2 


16 


+32 


64 


144 


57-59 


+3 


8 


+24 


72 


128 


60-62 


+4 


2 


+ 8 


32 


50 


63-65 


+5 


1 


+ 5 


25 


36 



Sum 100 (-51+88) 329 503 
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Control Check. 

Sa;2F(a;) = 329 

TLxFix) = 74 

1:F{x) = 100 



Sum = 503 = S(a; + \fF{x) 

b = w(88 - 51) : 100 = wX 0.37 = 1.11, 

or M = Mo+b= 50.11, 

ff2 = w2[329 : 100 - l^Y = w'(3.29 - 0.137) = 28.377, 

or (T = 5.33. 

Giving due allowance for the respective mean errors of the mean 
and the deviation we have finally :'- 

M = 50.11 ± 0.536, (T = 5.33 db 0.378. 

We shall now compare these values with the corresponding the- 
oretical values of the Bernoullian series. The a priori probabil- 
ities of drawing red and white are in this example p = 9 = §• 
Hence we have as the theoretical values for the mean and the 
dispersion: 

Jf^ = 100 X i = 50, ffg = VlOO X ^ X I = 5. 

A comparison between the observed and the theoretical ideal 
values — taking into account the proper mean errors — shows a 
very close agreement as far as the dispersion is concerned while 
the difference in the mean is about \ of the mean error. A 
computation of the Lexian Ratio and the Charlier Coefficient of 
Disturbancy yields the following results : 

L = 1.072; lOOp = 3.68. 

Taking into account the proper mean errors due entirely to 
the fluctuation of sampling we find, however, that our theoretical 
results and formulas of the previous chapters have been verified 
in an absolutely satisfactory manner. 

80. Charlier's Experiments. — In the above mentioned bro- 
chure, " Grunddragen," Charlier gives the results of a long series 
of card drawings illustrating the Bernoullian, the Poisson and 
the Lexian Series. As an example showing the frequency dis- 

1 6 is expressed in units of w. 

^ For mean errors of M and <r see Addenda. 
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tribution in a Bernoullian Series Charlier made 10,000 individual 
drawings (with replacements) from an ordinary whist deck and 
recorded the number of black and red cards drawn in this manner. 
Arranging the drawings in sample sets of 10 individual drawings, 
M. Charlier gives the following table: 

Bebnoullian Series. Number (to) of Black Cards in Sample Sets of 10. 
s = 10, N = 1,000, Mo = 5, w = 1. 



m. 


X, 


F{x) 


xF,x). 


a:! 


F(x). 


i.z+i)mx). 





-5 


3 


- 15 


+ 


75 


+ 48 


1 


-4 


10 


- 40 


+ 


160 


+ 90 


2 


-3 


43 


-129 


+ 


387 


+ 172 


3 


-2 


116 


-232 


+ 


464 


+ 110 


4 


-1 


221 


-221 


+ 


221 





5 





247 










+ 247 


6 


+ 1 


202 


+202 


+ 


202 


+ 808 


7 


+2 


115 


+230 


+ 


460 


+ 1,035 


8 


+3 


34 


+ 102 


+ 


306 


+ 544 


9 


+4 


9 


+ 36 


+ 


144 


+ 225 


10 


+5 


















Sum: 


1,000 


- 67 


+2,419 


+3,285 






Control Check. 












•2^xF{x) = 


+ 2,419 












2-2xFix) = 


- 134 












^Fix) = 


+ 1,000 









Sum = + 3,285 = S(a; + \YF{x) 

From the above values we obtain: 
6 = - 67 : 1,000 = - 0.67; a^ = 2,419 : 1,000 - 6^ = 2.415. 

Making due allowance for mean errors we have thus: 

M = 5 - 0.067 = 4.933 ± 0.050; a = 1.554 ± 0.035. 

For the theoretical mean and dispersion we obtain the following 
values: {v = q = \) 

Mb =5; <Tb= 1-581, 

which gives the following values for the Lexian Ratio and the 
Charlier coefficient: 

L = .983, lOOp is imaginary. 

These results would indicate a slightly subnormal series. Tak- 
ing into account the fluctuations due to sampling and for which the 
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mean error serves as a measure the results become normal and 
serve again as a verification of the theory. 

Poisson Series. — As an illustration of the frequency distribution 
in a Poisson Series Charlier made the following experiment: 
From an ordinary whist deck was drawn a single card and the 
color noted. Before the second drawing a spade was eliminated 
from the deck and replaced by a heart from another deck of 
cards, so that the deck then contained 12 spades, 13 clubs, 13 
diamonds and 14 hearts; from this deck another card was drawn 
and the color noted. Then another spade was eliminated and a 
heart substituted. From this deck, containing 11 spades, 13 
clubs, 13 diamonds and 15 hearts, a card was again drawn. The 
drawings were in this manner continued until all the spades were 
replaced by hearts. The same operation was applied to the 
clubs, which were replaced by diamonds. After 27 drawings 
the deck contained only red cards. Altogether 100 sample sets 
of 27 drawings were made with the following results: 



Poisson Series. 


Number 


(m) OF Black Cari 


DS IN Sample 


Sets of 27. 




s = 27, 


N = 100, 


M, = 7, 


w = 1. 




m. X. 


Fl^x). 


xF(,x). 


x^F{x). 


(x+l)^F(,x). 


Control Check. 


3 -4 


2 


- 8 


+ 32 


18 




4 -3 


6 


-18 


+ 54 


24 


+378 


5 -2 


14 


-28 


+ 56 


14 


+ 32 


6 -1 


14 


-14 


+ 14 





+100 


7 


22 








22 






8 +1 


17 


+17 


+ 17 


68 


+510 


9 +2 


14 


+28 


+ 56 


126 




10 +3 


8 


+24 


+ 72 


128 




11 +4 


1 


+ 4 


+ 16 


25 




12 +5 


1 


+ 5 


+ 25 


36 




13 +6 


1 


+ 6 


+ 36 


49 





Sum: 100 +16 +378 510 

The calculation of the mean and the dispersion with their 
respective mean errors yields the following result : 

b= + 0.16, M = 7.16 ± 0.211, 
a^ = 3.78 - (0.16)2 = 3.754, <r = 1.937 ± 0.149. 

The theoretical Poisson values according to the formulas of 

§67 are: 

Mp = 6.75, <7p = 2.111. 

If we now take the arithmetic mean of the various proba- 
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bilities of drawing a black card we find that po = J. If all the 
drawings had been performed with a constant probability we 
should according to the Bernoullian scheme have: 

J4 = 27 X i = 6.75, a^ = V27 + i X | = 2.25. 

These results verify the formulas as obtained under the discussion 
of the Poisson Series. {Mp = M^, ap < cr^.) 

Lexiaii Series. — In testing the Lexian Series Charlier first 
took 10 samples of 10 individual drawings in each sample from 
an ordinary whist deck. The number of black cards thus 
drawn was recorded. After this, 10 samples of the same mag- 
nitude were taken from a deck containing 25 black and 27 red 
cards; and then 10 samples from a deck with 24 black and 28 red 
cards. Of the total 270 samples (until the deck contains only 
red cards) Charlier gives the first 100 which gave the following 
result : 

Lexian Series. Number (m) of Black Cards in 10 Drawings. 
s = 10, JV = 100, Ma = 4. 



m. 


X. 


nx). 


xF(x) 


x'^Fix). 


(.x+l)''Fix). 


Control Check. 


1 


-3 


4 


-12 


+ 36 


+ 16 




2 


-2 


9 


-18 


+ 36 


+ 9 




3 


-1 


19 


-19 


+ 19 


+ 




4 





21 








+ 21 




5 


+1 


23 


+23 


+ 23 


+ 92 




6 


+2 


10 


+20 


+ 40 


+ 90 


+294 


7 


+3 


12 


+36 


+108 


+ 192 


+ 76 


8 


+4 


2 


+ 8 


+ 32 


+ 50 


+100 



Sum: 100 +38 +294 +470 +470 

The final computations (with mean errors) give: 

b = + 0.38, M = 4.38 ± 0.167, 

(7= = 294 : 100 - ¥= + 2.796, tr = + 1.672 ± 0.118. 

The mean probability in all trials was: 

Po = 21.50 : 52 = 0.4,135, or Mj, = spo = 4.135, 

ffg = '^spoqo = 1.557. 

A calculation of the mean and the dispersion according to the 
formulas under the Lexian Series (see § 74) gives according to 
Charlier: 

M^ = 4.135, (Ti = 1.643. 
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This shows that the dispersion in a Lexian Series is greater than 
the corresponding Bernoulhan dispersion. The Lexian Ratio: 
L = (Tj^ : (Tg has the value 1.06. The series according to the 
terminology of Lexis has a hypernormal dispersion, although 
a very small one. Charlier in "Grunddragen" (§ 30) says that 
when arranging the material in 27 samples, each saajpie con- 
taining 100 single trials, the Lexian Ratio has the value i=3.82, 
indicating a greater hypernormal dispersion than in the smaller 
samples. 

81. Experiments by Bon3mge and Fisher. — As an additional 
verification of the Bernoullian, Poisson and Lexian Series my 
co-editor, IMr. Bonynge, and myself have repeated the experi- 
ments of Westergaard and Charlier in a slightly modified form. 

Bernoullian Series. — In 20 sample sets, each set containing 
500 individual drawings, from an ordinary whist deck, I counted 
the number of diamonds drawn in each sample. My records gave 
the following scheme: 

Bebnotjllian Series. Number of Diamonds (m) in 20 Sample Sets op 





500 Drawings. 






s = 500, iV = 


20, Mo = 125. 




m. 


m - 


-Mo. 


(m - Jlfo)«. 


123 


- 2 




4 


143 




+ 18 


324 


124 


- 1 




1 


133 




+ 8 


64 


142 




+ 17 


289 


130 




+ 5 


25 


117 


- 8 




64 


122 


- 3 




9 


132 




+ 7 


49 


109 


-16 




256 


130 




+ 5 


25 


139 




+ 14 


196 


138 




+ 13 


169 


129 




+ 4 


16 


136 




+ 11 


121 


121 


- 4 




16 


135 




+ 10 


100 


124 


- 1 




1 


135 




+ 10 


100 


116 


- 9 




81 



Sum: -44 +122 1,910 

The results with their respective mean errors are as follows: 
M = 128.9 ± 2.01, a- = 8.962 ± 1.416 
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The theoretical Bernoullian mean and the dispersion have the 
values: 

Ms = 125, <7s = aGp? = VSOO X i X f = 9.682, 

where p = I denotes the a priori probability of drawing a 
diamond. 

Again I counted the number of aces (irrespective of color) 
which appeared in 100 sample sets of 100 individual drawings 
from the same deck of cards. The records arranged in classes 
gave the following scheme: 

Number of Aces (to) in 100 Sample Sets of 100 Individual Drawings. 

s = 100, N = 100, Mo = 8, w = 1. 

■n. X. Fix). xF(,x). x^Fix). (,x+l)^F{x). Control Check 

2-6 1-6 36 25 

3-5 8 -40 200 128 

4-4 8 -32 128 72 



5 


-3 


7 


-21 


53 


28 




6 


-2 


9 


-18 


36 


9 




7 


-1 


21 


-21 


21 







8 





13 








13 




9 


+1 


15 


+ 16 


15 


60 




10 


+2 


3 


+ 6 


12 


27 




11 


+3 


9 


+27 


81 


144 


+811 


12 


+4 


1 


+ 4 


16 


25 


-110 


13 


+5 


2 


+ 10 


50 


72 


+ 100 


Id 


+6 


2 


+ 12 


72 


98 




At: 




15 


+7 





+ 








801 


16 


+8 





+ 










17 


+9 


1 


+ 9 


81 


100 





Sum: 100 -55 811 801 

6 = - 55 : 100 = - 0.55, 
M = Mo+h = 7.45 ± 0.279 (with mean error), 

or 

ff = 2.794 ± 0.198 (with mean error). 

The theoretical Bernoullian values are: 

Ms = 100 X tV = 7.69, (Ts = VlOO X-^sX{i = 2.663. 

A comparison between the empirical and the theoretical a priori 
values exhibits a close correspondence. 
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Poisson Series.— As an illustration of the Poisson Series 
Mr. Bonynge made the following experiment. A sample set of 
20 single drawings of balls from an urn (one ball being drawn at a 
time) was made under the following conditions: 

In drawing No. 1 the urn contained 20 white and 20 black balls. 
" " " 2 " " " 21 " " 19 " " 

" " " 3 " " " 22 " " 18 " 



' " " 20" " " 39 " " 1 " " 

Altogether Bonynge took 500 sample sets which arranged in 
classes give the following scheme: 

Poisson Series. Number of Black Balls (ot) in 500 Sample Sets of 20 
Individual Drawings (Bonynge). 

s = 20, N = 500, Mo = 5. 



m. 


X. 


FM. 


xF(,x). 


x^'Fl^x). 


(i + iyF(x). 





-5 


2 


- 10 


50 


32 


1 


-4 


9 


- 36 


144 


81 


2 


-3 


35 


- 105 


315 


140 


3 


-2 


52 


- 104 


208 


52 


4 


- 1 


86 


- 86 


86 





5 





109 








109 


6 


+ 1 


85 


+ 85 


85 


340 


7 


+ 2 


69 


+ 138 


276 


621 


8 


+ 3 


30 


+ 90 


270 


480 


9 


+ 4 


16 


+ 64- 


256 


400 


10 


+ 5 


6 


+ 30 


150 


216 


11 


+ 6 


1 


+ 6 


36 


49 


Sum 


S = 


500 


+ 72 


1876 


2520 


Hence we '. 


have: 











h = 0.144, M = 5.144, a'' = 3.732, <r = 1.932. 

The theoretical Poisson values are: 

Mp = 5.25, <Tp = 1.86 (see formulas, § 74). 

The mean of the various probabilities of drawing a black ball is 
po = f^. According to the Bernoullian scheme we should then 
have the following values for the mean and the dispersion: 

M^ = 20 X l-J = 5.25, <r^ = (20 X U X f|)* = 1.968. 

These values confirm the Poisson theorems (Mp = Mg, Op < a^. 
Lexian Series. — As additional illustration of the Lexian Series 
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I took 20 sample sets, each set containing 500 drawings of a 
single ball from an urn (with replacements). The contents of 
the urn varied from set to set as follows: 

Sample set No. 1 : 20 white and 20 black balls. 
" " " 2 : 21 " " 19 " " 
" " " 3 : 22 " " 18 " " 



" " " 20 : 39 " " 1 " 

In the 21st set all the black balls were eliminated and the urn 
contained white balls only. This set, however, was not taken in 
consideration in calculating the mean and the dispersion. 

Lexian Series. Number (to) of Black Balls in 20 Sample Sets of 500 
Individual Drawings (Fisher). 

s = 500, N = 20, Mo = 130. 



No. of Set. 


m. 


(m - Mo). 


(ot - Mo)2. 


1 


251 


+ 121 


14641 


2 


246 


+ 116 


13456 


3 


222 


+ 


92 


8464 


4 


216 


+ 


86 


7396 


5 


193 


+ 


63 


3969 


6 


176 


+ 


46 


2116 


7 


183 


+ 


53 


2809 


8 


173 


+ 


43 


1849 


9 


156 


+ 


26 


676 


10 


135 


+ 


5 


25 


11 


140 


+ 


10 


100 


12 


127 


- 3 




9 


13 


115 


- 15 




225 


14 


96 


- 34 




1156 


15 


78 


- 52 




2704 


16 


69 


- 61 




3721 


17 


55 


- 75 




5625 


18 


43 


- 87 




7569 


19 


29 


- 101 




10201 


20 


19 


-111 




12321 


Sum: 


s = 


- 539 + 661 


99012 



h = (661 - 539) : 20 = 6.6, M = Mo + h = 136.6 ± 15.86. 
(72 = 99012:20-62 = 4913.4, a = 70.098 ±11. 09 (with mean errors). 
The theoretical Lexian values are: 

Ml = 131.25, o-x = 72.676 (see § 74). 
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If the series represented a true BernouUian Series, we should 
have 



Ms = 500 XU = 131.25, a^ = V500 X f i X f f = 9.839. 

These values confirm the Lexian Theorem {Ml = Mb, (Tl>(Tb)- 
A computation of the Charlier CoeflBcient of Disturbancy from 
the observed values gives : 

100, = 50.80 

whereas the theoretical value is 55.38, showing a decidedly 
hypernormal dispersion, a result which was to be expected since 
the probabilities of drawing black varies from \ to ^V i'^ the 
various sets of samples. 

All the above experiments show a completely satisfactory 
verification of the various theorems of the previous chapters 
and may perhaps serve as a vindication of the followers of 
Laplace, who like him hold that an a ■priori foundation for 
probabiHty judgments is indispensable. 
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CHAPTER XII. 

CONTINUATION OF THE APPLICATION OF THE THEORY OF 
PROBABILITIES TO HOMOGRADE STATISTICAL SERIES. 

82. General Remarks. — In this chapter it is our intention to 
discuss the apphcation of the theory of probabilities to homograde 
statistical series with special reference to vital statistics. We 
owe the reader an apology, however, inasmuch as in the former 
paragraphs we have employed the term statistics without defining 
its meaning in a rigorous manner. A definition may perhaps 
appear superfluous since statistics nowadays is almost a house- 
hold word. The term unfortunately is often employed as a mere 
phrase without any understanding of its real meaning. This 
applies especially to that band of self-styled statisticians, mere 
dilettanti, who, with an energy which undoubtedly could be 
better employed otherwise, attempt to investigate and analyze 
mass phenomena regardless of method and system. When 
investigations are undertaken by such dilettanti the common 
gibe that "statistics will prove anything" becomes, alas, only 
too true and proves at least that " like other mathematical tools 
they can be wielded effectively only by those who have taken the 
trouble to understand the way they work."^ 

By the science of statistics we understand the recording and 
subsequent quantitative analysis of observed mass phenomena. 

By mathematical statistics {also called statistical methods) we 
understand the quantitative determination and measurement of the 
effect of a complex of causes acting on the object under investigation 
as furnished by previously recorded observations as to certain attri- 
butes among a collective body of individual objects. 

Practical statistics — if such a name may be used — then simply 
becomes the mechanical collection of statistical data, i. e., the 
recording of the observed attributes of each individual. In no 
way do we wish to underestimate the importance of this process 

1 See Nunn, "Exercises in Algebra" (London, 1914), pages 432-33. 

146 
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which is as important for the statistical analysis as is the gathering 
of structural materials for the erection of a large building. 

Mathematical statistics is thus the tool we must use in the final 
analysis of the statistical data. It is a very effective and powerful 
tool when used properly by the investigator. At the same time 
it is not an automatic calculating machine in which we need only 
put the material and read off the result on a dial. A person 
without any knowledge whatsoever about the nature of loga- 
rithms may in a few hours be taught how to use a logarithmic 
table in practical computations, but it would be foolish to view 
the formulas and criteria from probabilities when applied to 
statistical data in the same light as a table of logarithms in cal- 
culating work. Such formulas and criteria must be used with 
caution and discretion and only by those who have taken the 
trouble to make a thorough study of probabilities and master 
their real meaning and their relation to mass phenomena. If 
put in the hands of mere amateurs the formulas become as 
dangerous a toy as a razor to a child. 

It is not our intention to give in this work a description of the 
technique of the collection of the material, which depends to a 
large extent on local social conditions and for which it is difficult 
to give a set of fixed rules. In the following we shall treat the 
mathematical methods of statistics exclusively, and furthermore 
make the theory of probabilities the basis of our investigations. 

83. Analogy between Statistical Data and Mathematical 
Probabilities. — Let us for the moment imagine a closed commun- 
ity with a stationary population from year to year and let us 
denote the size of such a population by s. Let us furthermore 
suppose we were given a series of numbers: 

mi, m2, ma, • • • m^f, 

denoting the number of children born in various years in this 
community. The ratios 

mi 7112 ma m^ 

V ' V ' V' 7 

may then be looked upon as probabilities of a childbirth in 
various years. As Charlier justly remarks, "such an identi- 
fication of a statistical ratio with a mathematical probability is. 
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at first sight a mere analogy which possibly may have very little 
in common with the observed statistical phenomena, but a 
closer scrutiny shows the great importance for statistics of such' 
a view." If such ratios could be regarded as mathematical 
probabilities wherein the various m's were identical to favorable 
cases in s total trials, the mean and the dispersion could be de- 
termined a priori from the Bernoullian Theorem. The founders 
of mathematical statistics regarded the identification of an or- 
dinary statistical series with a Bernoullian Series almost as 
axiomatic. This view is found even among some leading writers 
of the present time. Among others we apparently find this 
traditional view by the eminent English actuary, G. King, in his 
classic "Text Book." In Chapter II of this well-known standard 
actuarial treatise a probability is defined as follows : " If an event 
may happen in a ways and fail in /3 ways, all these ways being 
equally likely, the probability of the happening of the event is 
a -r- (a -|- /?)." With this definition as a basis King then de- 
duces the elementary formulas of the addition and multiplication 
theorems. He then continues: "Passing now to the mortality 
table, if there be Ix persons living at age x, and if these Z^+n survive 
to age X -\- n, then the probability that a life aged x will survive 
n years is Ix+n -^ L = nPx- And again "the probability that a 
life aged x and a life aged y will both survive n years is nPxX nVv-"^ 
From the above it would appear that the author unreservedly 
assumes a one-to-one correspondence between the Ix+n survivors 
and "favorable ways" as known from ordinary games of chance 
and a similar correspondence between the original Ix persons and 
"equally possible cases." A simple consideration will show that 
there exists no a priori reason for such a unique correspondence 
between ordinary empirical death rates and mathematical proba- 
bilities. None of the original 4 persons can be considered as 

1 Mr. H. Moir in his "Primer of Insurance" tried to avoid the difficulty by 
giving a wholly new definition of ''equally likely events." According to 
Moir "events may be said to be 'equally likely' when they recur with regu- 
larity in the long run." Apart from the half metaphorical term "in the long 
run" Mr. Moir fails to state what he means by the expression "with regu- 
larity." If the statement is to be understood as regular repetitions of a certain 
event in various sample sets, it is evident that we may obtain a regular recur- 
rence of the observed absolute frequencies in a Poisson Series, where — as 
we know — the events are not equally likely." — A.F. 
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being "equally likely" as in the sense of games of chance. 
Numerous factors such as heredity, environment, climatic and 
economic conditions, etc., play here a vital part in the various 
complexes embracing the original l^ persons. 

The belief in an absolute identity of mathematical probabilities 
and statistical frequency ratios seems to have originated from 
Gauss. The great German mathematician — or rather the 
dogmatic faith in his authority as a mathematician — proved 
thus for a number of years a veritable stumbling block to a 
fruitful development of mathematical statistics. Gauss and his 
followers maintained that all statistical mass phenomena could 
be made to conform with the law of errors as exhibited by the 
so-called Gaussian Normal Error Cur^'e. If certain statistical 
series exhibited discrepancies they claimed that such deviations 
arose from the limited number of observations. The deviations 
would become less marked if the number of observed values was 
enlarged and would eventually disappear as the number of ob- 
servations approached infinity as its ultimate value. The Gaus- 
sian dogma held sway despite the fact that the Danish actuary, 
Oppermann, and the French mathematicians, Binemaye and 
Cournot, have pointed out that several statistical series, despite 
all efforts to the contrary offered a persistent defiance to the 
Gaussian law. The first real attack on the dogma laid down so 
authoritatively by Gauss was delivered by the French actuary, 
Dormay, in certain investigations relating to the French census. 
It was, however, first after the appearance of the already men- 
tioned brochure by Lexis, "Die Massenerscheinungen, etc.," that 
a correct idea was gained about the real nature of statistical 
series. 

The Lexian theory was expounded in the previous chapters of 
this work, and we are therefore ready to enter upon the investi- 
gations of a few selected mass observations from the domain of 
vital statistics. 

84. Number of Comparison and Proportional Factors. — In 
the mathematical treatment of the Lexian theory of dispersion 
we tacitly assumed that the total number of individual trials in 
a sample set or the number of comparison, s, remained constant 
from set to set. In the observations on games of chance it 
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remained in our power to arrange the actual experiments in such 
a manner that s would be constant. In actual social statistical 
series such simple conditions do not exist. In comparing the 
number of births in a country with the total population it is 
readily noticed that the population does not remain constant 
but varies from year to year. For this reason the various 
numbers m denoting the births are not directly comparable with 
another. We may, however, easily form a new series of the form: 

S S S 9 

— • mi, — ■ m2, — • ms, • ■ • — • m^, 
Sl 52 S3 s^ 

wherein the various numbers, mi, m2, ma • • • , corresponding to 
the numbers of comparison Si, Si, s^, ■ ■ ■ , are reduced to a constant 
number of comparison s. This series is by Charlier called a 
reduced statistical series. Such a reduction requires, in many 

Proportional Factoes for a Hypothetical Stationary Population in 

Sweden and Denmark Equal to 5,000,000 and 2,500,000 

Respectively. 





Sweden, 






Denmark, 




Year. 


Inhabitants. 


«:»4. 


Year. 


Inhabitants. 


s: Sk 


1876 


4,429,713 


1.1288 


1888 


2,143,000 


1.1666 


1877 


4,484,542 


1.1150 


89 


2,161,000 


1.1569 


1878 


4,531,863 


1.1033 


1890 


2,179,000 


1.1473 


79 


4,578,901 


1.0919 


91 


2,195,000 


1.1390 


1880 


4,565,668 


1.0952 


92 


2,210,000 


1.1312 


81 


4,572,245 


1.0936 


93 


2,226,000 


1.1230 


82 


4,579,115 


1.0919 


94 


2,248,000 


1.1121 


83 


4,603,595 


1.0861 


1895 


2,276,000 


1.0984 


84 


4,644,448 


1.0765 


96 


2,306,000 


1.0841 


1885 


4,682,769 


1.0677 


97 


2,338,000 


1.0694 


86 


4,717,189 


1.0600 


98 


2,371,000 


1.0544 


87 


4,734,901 


1.0560 


99 


2,403,000 


1.0404 


88 


4,748,257 


1.0530 


1900 


2,432,000 


1.0280 


89 


4,774,409 


1.0472 


01 


2,462,000 


1.0154 


1890 


4,784,981 


1.0449 


02 


2,491,000 


1.0036 


91 


4,802,751 


1.0410 


03 


2,519,000 


0.9925 


92 


4,806,865 


1.0402 


04 


2,546,000 


0.9819 


93 


4,824,150 


1.0365 


1905 


2,574,000 


0.9713 


94 


4,873,183 


1.0261 


06 


2,603,000 


0.9604 


1895 


4,919,260 


1.0165 


07 


2,635,000 


0.9488 


96 


4,962,568 


1.0076 


08 


2,668,000 


0.9370 


97 


5,009,632 


0.9981 


09 


2,702,000 


0.9252 


98 


5,062,918 


0.9875 


1910 


2,737,000 


0.9134 


1899 


5,097,402 


0.9809 


11 


2,800,000 


0.8929 


1900 


5,136,441 


0.9734 


1912 


2,830,000 


0.8834 
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cases, a certain correction. However, when the general ratios 
.-' -H .■jj v^- = 1, 2, 3 • • • ^') are close to unity the reduced series 
may be treated as a directly observed series. In most of the 
following examples taken from Scandina\-ian statistical tabular 
works the proportional factor s 4- .>■,. is close to unity as shown in 
the table below. For Sweden I have, following CharUer, assumed 
a stationary population .« = 5,000,000. The corresponding 
Danish .•< I have taken as 2,500,000. 

The above figures are taken from " Sveriges officielle statistik " 
and "Statistisk Aarbog for Danmark" for 1913 (Precis de 
Statistique, 1913). 

85. Child Births in Sweden. — From Charlier's "Grimddragen" 
I select the foUo^4ng example showing the number of children 
born in Sweden in the period from 1 SSI -1900 as reduced to a 
stationary population of 5,000,000. 





s = .^.000,000, 


.V = 20. 


J/o = 140,000. 


Tear. 


m. 


m-U^ 




{m~iS:''. 


1S61 


145,230 


-5.230 




27.352,900 


S2 


146,630 


—6.630 




44.0S9.600 


S3 


144.320 


4-4.320 




IS.662.400 


S4 


149.360 


-9.360 




S7.609.600 


ISSo 


146.600 


-rO.eoo 




43.560.000 


S6 


US. 270 


-S.270 




6S.392,900 


S7 


14S.020 


-S.020 




64,3-20.400 


ss 


143.6S0 


-3.6S0 




13.542,400 


S9 


loS.SOO 




-1,700 


2.S90.000 


1S90 


139,600 




- 400 


160.000 


91 


141.070 


-rl,070 




1.144,900 


92 


134.S30 




-5.170 


26.r2S,900 


93 


136.540 




-3,460 


11,971.600 


94 


134,S40 




-5.160 


26.625.600 


1?Q5 


136.S20 




-3.1S0 


10,112.400 


96 


135.330 




-4.670 


21.SOS.900 


97 


132.7.50 




-7.2.50 


52.562.500 


9S 


134. S20 




-5. ISO 


26,S:32.400 


99 


131.320 




— S.6S0 


75.342.400 


1900 


134.460 




-5.540 


30,691,600 




Sum - = • 


4- 53,190 ■ 


-50,390 


6^4,401,400 



From which we obtain: 

b= (+ 53.190 - 50,390) : 20 = 140 
J/ = J/o + 6 = 140,140 
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a^ = 654,401,400 : 20 - 5^ = 32,700,470, ox a = 5,718. 
The empirical probability of a birth (po) is 
po = ^1/ : 5 = 0.02803, so that go = 1 - Po = 0.97197 and the 
Bernoullian dispersion 

as = "Vspo ?o = 369.0. 

The actual observed dispersion (5,718) is thus much greater 
than the Bernoullian. The birth series is considerably hyper- 
normal. The Lexian ratio has the value 

L = 5,718 : 369.0 = 15.50, 

while the Charlier coefficient of disturbancy is: 

lOOp = 4.07. 

Both the values of L and p show that the birth series by no 
means can be compared with the ordinary games of chance but 
is subject to outward perturbing influences. 

86. Child Births in Denmark. — The following example shows 
the corresponding birth series for Denmark in the 25-year period 
from 1888-1912 as reduced to a stationary population of 2,500,000. 
The computation of the various parameters follows: 

h = (39,713 - 30,287) : 25 = ^- 377, 

M = Mo+b = 73,377, 

(T^ = 281,208,156 : 25 - 6^ = 11,106,197.2, 

o-/ = spo go = 71,223. (po = M : s = 0.0293508), 

L = a : (Tb = 12.5 

lOOp = 100( ^1<T^ - a/) : M = 4.52. 

Number of Children Born in Denmark as to Calendar Year. 
s = 2,500,000, N = 25, Mo = 73,000. 



Year. 


m. 


1888 


78,659 


89 


77,956 


1890 


76,154 


91 


77,377 


92 


74,059 


93 


76,965 


94 


75,956 


1895 


75,649 


96 


76,183 


97 


74,404 



m— Ma. 


(m-ilfo)2. 


+ 5,659 


32,024, 281 


+ 4,956 


24,561,936 


+ 3,154 


9,947,716 


+ 4,377 


19,158,129 


+ 1,059 


1,121,481 


+ 3,965 


15,721,225 


+ 2,956 


8,740,636 


+ 2,649 


7,017,201 


+ 3,183 


10,131,489 


+ 1,404 


1,971,216 
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Year. 


m. 




m 


—Afo- 


(.m-Mo)'. 


98 


75,570 




+ 2,570 


6,604,900 


99 


74,236 




+ 


1,236 


1,527,606 


1900 


74,146 




+ 


1,146 


1,313,316 


01 


74,341 




+ 


1,341 


1,798,281 


02 


73,058 




+ 


58 


3,364 


03 


71,802 


- 1,198 






1,435,204 


04 


72,359 


- 641 






410,881 


1905 


70,981 


- 2,019 






4,076,361 


06 


71,280 


- 1,720 






2,958,400 


07 


70,516 


- 2,484 






6,170,256 


08 


71,438 


- 1,567 






2,455,489 


09 


79,597 


- 2,403 






5,774,409 


1910 


68,777 


- 4,223 






17,833,729 


11 


66,016 


- 6,984 






48,776,256 


1912 


65,952 


- 7,048 






49,674,304 




Sum: 2 


= -30,287 


+39,713 


281,208,156 



Practically the same deductions hold true for this Danish 
series as for the Swedish series. We meet again a hj-pernormal 
series subject to perturbing influences. The closeness of the 
two values of the Charlier coefficient of distm-bancy indicates 
that the munber of births in Sweden and Denmark apparently 
are subject to the same outward disturbing influences. 

87. Danish Marriage Series. — The following table shows the 
number of marriages in Denmark from 1888-1912. 

XUMBEK OF MaKRIAGES IN DeNMAEK. 

s = 2,500,000, N = 25, Mo = 18,000. 
Year. m. 

1888 17,605 

89 17,622 

1890 17,181 

91 17,017 

92 17,012 

93 17,676 

94 17,445 
1895 17,736 

96 18,239 

97 18,676 

98 18,870 

99 18,661 
1900 19,015 

01 17,870 

02 17,712 

03 17,791 

04 17,895 
1905 17,947 



■ Mo. 






(m-Mo)'. 


395 






156,025 


378 






142,884 


819 






670,761 


983 






966,289 


988 






976,144 


324 






104,976 


555 






308,025 


264 






69,696 




+ 


239 


57,121 




+ 


676 


456,976 




+ 


870 


756,900 




+ 


661 


436,921 




+1,015 


1,030,225 


130 






16,900 


288 






82,944 


209 






43,681 


105 






11,025 


53 






2,809 
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Year. 


m. 


06 


18,592 


07 


19,072 


08 


18,750 


09 


18,453 


.910 


18,255 


11 


17,749 


.912 


18,034 



251 



m— Mo. 


(m-ilfo)2. 


+ 592 


350,464 


+ 1,072 


1,149,184 


+ 750 


562,500 


+ 453 


205,209 


+ 255 


65,025 




63,001 


+ 34 


1,156 



Sum: 2 = -5,742 +6,617 8,686,841 

Hence we have: 

h = (6,6-17 - 5,742) : 25 = 35, M = Mo + & = 18,035. 

0-2 = (8,686,841 : 25) -W= 346,249, a = 588.43, 

cb = 133.81, L = 4.41, lOOp = 5.73. 

We encounter again a hypernormal series with quite large 
perturbations. For Sweden Charlier has computed the coef- 
ficient of disturbancy for marriages in the period 1876-1900 and 
found it to be 5.49. A comparison with the same quantity for 
the above Danish data shows that the perturbing influences 
for the two countries are about the same. 

88. Stillbirths. — As another example from vital statistics I 
give the number of stillbirths in Denmark from 1888-1912 as 
compared with a hypothetical number of 70,000 births per annum. 

Number of Stillbirths in Denmark as Reduced to a Stationary Number 





OF 70,000 Births per Annum. 






s = 70,000, N 


= 25, 


iWo = 


= 1,700. 




Year. 


m. 




m- 


-Mo. 


(m-MoV 


1888 


1,861 




+ 


161 


25,921 


89 


1,924 




+ 


224 


50,176 


1890 


1,830 




+ 


130 


16,900 


91 


1,779 


. 


+ 


79 


6,241 


92 


1,811 




+ 


HI 


12,321 


93 


1,788 




+ 


88 


7,744 


94 


1,719 




+ 


19 


361 


1895 


1,753 




+ 


53 


2,809 


96 


1,714 




+ 


14 


196 


97 


1,811 




+ 


111 


12,321 


98 


1,797 




+ 


97 


9,409 


99 


1,737 




+ 


37 


1,369 


1900 


1,696 


- 4 






16 


01 


1,732 




+ 


32 


1,024 


02 


1,694 


- 6 






36 


03 


1,685 


- 15 






225 


04 


1,682 


- 18 






324 
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Year. m. M-m^. (m-Afo)'. 

1905 1,705 +5 25 

06 1,620 - 80 6,400 

07 1,723 + 23 529 

08 1,694 - 6 36 

09 1,665 - 35 1,225 
1910 1,658 - 42 1,764 

11 1,659 - 42 1,764 

12 1,638 - 62 3,844 

Sum: 2 = -310 +1,184 161,216 

Actual computation gives: 
b = (1,184 - 310) : 25 = 34.96, M = 1,734.96, 
(7^ = 161,216 : 25 - 62 = 5,226.44, lOOp = 3.407. 

The series is again hypernormal. We shall show presently, 
when discussing the disturbing influences, that this series after 
the elimination of the secular perturbations actually represents a 
normal series. In the meantime we give a few examples relating 
to accident statistics. 

89. Co£il Mine Fatalities. — The following table gives the 
number of deaths from accidents in coal mines in various countries 
in the period 1901-1910 together with the number of compari- 
son s. 

United 
Year Belgium Austria England France Germany Japan States 

s = 140,000 s = 68,000 s = 900,000 s = 180,000 s = 500,000 s = 110,000 a = 610,000 

1901 164 81 1,224 218 1,170 263 1,982 

02 150 73 1,116 196 995 188 2,263 

03 160 50 1,134 184 960 278 1,952 

04 130 62 1,116 193 900 239 2,135 
1905 127 99 1,215 187 930 354 2,214 

06 133 70 1,161 1,262 985 578 2,944 

07 1A4 73 1,179 198 1,240 399 2,977 

08 150 58 1,188 171 1,355 262 2,220 

09 133 73 1,287 210 1,021 667 2,440 
1910 133 • 63 1,530 194 985 245 2,391 

This gives the following values for the Charlier coefficient: 

loop 

Belgium 2.55 

Austria 13.85 

England 4.71 

France 34.19 

Germany 9.27 

Japan 44.121 

U.S. A 12.07 

1 1 doubt whether the Japanese data as given by the Bureau of Mines are 
reliable. 
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The comparatively large values of p show that the fatal ac- 
cidents in coal mines are subject to violent perturbations. The 
disturbing influences are greatest for France where the Charlier 
coefficient is above 34, which immediately shows that some 
powerful disturbing influence has made itself felt. Looking over 
the table we find a very large number of deaths for the year 1906. 
The extremely heavy death rate in this year was caused by the 
Courrieres mine explosion, in which 1,099 persons lost their lives 
and marks probably the most fatal disaster in the whole history 
of coal-mining. Eliminating this catastrophe from the data in 
the table given above we find indeed that the coefficient of dis- 
turbancy becomes imaginary, indicating very stable conditions 
in French mines. Thus eliminating the more fatal catastrophes 
we get at least for France a subnormal series for the everyday 
accidents. In order better to illustrate the influence of the 
elimination of the most disturbing catastrophes I submit the 
following two series as reduced to a stationary s = 630,000 of 
fatal coal mine accidents in the United States in the period 
1900-1914 as recorded by the Bureau of Mines. The first series 
shows total number of deaths m*, the second series gives the total 
deaths m^' per year after eliminating all such accidents in which 
5 or more men were killed. 

Number of Deaths from Accidents in Coal Mines in United States. 







s = 


630,000, N = 


15. 








m* 


mk' 






vn 


mt' 


1900 


2,173 


1,843 




1908 


2,293 


1,967 


01 


2,048 


1,863 




09 


2,520 


2,053 


02 


2,337 


1,837 




1910 


2,470 


2,085 


03 


2,016 


1,768 




11 


2,350 


1,984 


04 


2,205 


1,911 




12 


2,060 


1,839 


1905 


2,286 


1,964 




13 


2,350 


1,957 


06 


2,111 


2,075 




1914 


2,070 


1,810 


07 


3,074 


2,190 











The first series gives a coefficient of disturbancy equal to 11.06 
while the same quantity for the second series has the value 5.51. 
Despite the fact that the coefficient of disturbancy is reduced 
about 50 per cent, there still remains disturbing influences, which 
clearly shows that conditions in American mines are not so stable 
as in the mines of France, Belgium and England. 
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90. Reduced and Weighted Series in Statistics. — So far all 

our problems in statistical analysis have been related to series 
where the value of s was constant or where the ratio s : Sk was 
so close to unity that it might be used as a factor of propor- 
tionality. We shall now consider the case where this ratio differs 
greatly from unity. As an illustration of this kind of series I 
choose the number of fatal coal mine accidents in various states 
of the American Federation together with the number of people 
engaged in coal mining in these states. The figures as taken from 
the report of the Bureau of Mines relate to the year of 1914.^ 

Number of Persons Engaged in Mining (st) and Number Killed 

( nii) in 20 States During the Year 1914. 

s = 1000, N = 20. 

a*. rrik. Jjos*. |mj-po«il- 

1 Alabama 24,552 128 73 55 

2 Colorado 10,550 75 31 44 

3 lUmois 79,529 141 237 96 

4 Indiana 22,110 44 66 22 

5 Iowa 15,757 37 47 10 

6 Kansas 12,500 33 37 4 

7 Kentucky 26,332 61 79 18 

8 Maryland 5,675 18 17 1 

9 Missouri 10,418 19 31 12 

10 New Mexico 4,021 18 12 6 

11 Ohio 45,815 62 136 74 

12 Oklahoma 8,948 31 27 24 

13 Pennsylvania 175,745 595 524 71 (Anthracite Mines) 

14 Pennsylvania 172,196 402 513 111 (Bituminous Mines) 

15 Tennessee 9,580 26 29 3 

16 Texas 4,900 11 15 4 

17 Virginia 9,162 27 27 

18 Washington 5,730 17 17 

19 W. Virginia 74,786 371 223 148 

20 Wyoming 8,353 51 25 26 

Sum: 2 = 726,659 2,167 709 

It will be noted that the population engaged in mining varies 
greatly from state to state. In making a simple reduction to a 
common number of comparisons by a proportional factor it is 
evident, however, that we would give the same weight to the 
observed from New Mexico with a population of miners equal to 

1 Catastrophes in the Eecles Mine in West Virginia and in the Royalton 
Mine of Illinois are eliminated. 
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4,021 as to the mining population of the state of Pennsylvania 
where over 340,000 persons are engaged in the same industry. 
This procedure is faulty. Let us imagine for the moment two sets 
of drawings from a bag containing white and black balls. The 
first sample set contained 10,000 drawings and the second set 
only 100 drawings. If these series were reduced to a common 
number of comparison s = 1,000 we should have 

' mi and "T>|7r™2 (mi and nii standing for the number 

of white balls) as the number of white balls drawn in sample sets 
of 1,000 single drawings. 

But these values are not equally reliable. The mean error in 
the second series is in fact 10 times as large as the mean error in 
the first series. In order to overcome this difficulty we ask the 
reader to consider the following series: 

The element — mi is repeated si times 



({ 


(( 


Si 


a 


«2 


€C 


tc 


S3 


tt 


S3 



s 

Sn 



JV 



In this way we obtain a series with 5i + •'2 + S3 + •••+«. 
elements which may be termed a reduced and weighted series 
since the larger Sk appears oftener than the smaller values of Sk- 
We shall now see if it is possible to determine the expected value 
of the mean and the dispersion if the series is supposed to follow 
the Bernoullian Law. 

The mean is defined by the following relation: 



< 5l > •« «2 > 

In 9 9 S 

M= — miH H — miH — m2+ ■ ■ ■ + - m2 

\_Si Si Si S2 
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< Sif > 

S 9 ~l 

+ • • • +-—mjf+ ■■■ + — mjf\^[si+S2+ ■■• + Sff] 

s 
_ Sk sl^rrii 

Denoting the average empirical probability by po we have 

2mfc : 2sit = po and, 

Mb = spo. 

As to the dispersion it takes on the following form: 
< «i > 

ff^ = ( —mi — spo j + ■ ■ ■ +1 — mi — spoj 



-Si- 



+ [jm, -sp,j+ ■■■+ [j^m, - sp,j+ ■ ■ 



^N 



+ f - m;v - «Po j + •••+(— m^ - spA \ 

-^ [*1 + «2 + • • • 5;,?] 

2**: —7714 — spo I 2 — {mk — SkPo) 

Sk / Sh 



'S.Sk 



2 

(fc= 1,2,3, •••iV). 



In finding the theoretical dispersion, assuming a Bernoullian 
distribution for which po may be used an an approximation of 
the mathematical a priori probability, we ask the reader to 
examine the general term of the expression for o^, viz. : 



— (m^ — SkPaY ■ 25i-. 



If the individual trials follow the BernoulUan Law the expected 
value of the factor (m* — SkPo)^ takes the form : 

e[(mk — SkPoY] = 2(7?ii — SkpaYipimk) = SkPaqa- 

This brings the general term for a^ to the form: 
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Thus the expected value of </'■ according to the Bernoullian 
distribution may be written as follows: 



s 



Ob = 2 --^r-spoqo = ^^-spoqo, or : tr^ = fVspoqo, 

where as before po = Sm;fe : 'Z,Sk and / = -. / 

These formulas give us the means of computing the Lexian 
Ratio and the Charlier coefficient of disturbancy in the ordinary 
way. Some of the computations require, however, a great 
amount of arithmetical work and the goal is reached more 
easily by making use of the mean deviation (in § 74a). 
We found there the following relation: 

0- = 1.2533??. 

In the weighted series it is readily seen that the value of & 
will be of the form: 



Ssi 



s 

— mfc — spo 

Sk 



Ssl mj: — SkPa I 



'2sk ^Sk 

If the series may be assumed to follow a Bernoullian distri- 
bution we have 

ffg = 1.2533!?. 

From the above formulas it is readily noticed that we may find 
the mean and the dispersion directly from the observed series 
without a preliminary reduction to a common number of com- 
parison s. This is in fact the method used in the above example 
of coal-mine accidents in various states. We have: 

po = Sm* : ^Sk = 2,167 : 726,659 = .002982, 

^ ^ ^s\m k--SkPo\ ^ ^ QQQ ^ 7Qg . 726^659 = 0.9757, 

0- = 1.2533 X !? = 1.223, 

<tb' = Pmqo = 1^^ X 1,000 x 0.997 x 0.003 = 0.0817, 



100Vcr2-(r/ ^^ 
lOOp = -^ = 40 approx. 
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The large value of the Charlier coefficient of disturbancy 
clearly shows that conditions in coal mines by no means are 
uniform in the whole union but vary greatly according to the 
locality. An actual computation shows in fact that in a few 
states such as Michigan and Iowa we find an imaginary coeffi- 
cient of disturbancy whereas States as Ohio and West Virginia 
exhibit marked hypernormal series with a large coefficient of 
disturbancy. The establishment of this fact is of some im- 
portance in connection with accident assurance. Many sta- 
tisticians seem to be of the opinion that a standard accident table 
computed from the data of the whole union ought to serve as 
the basis for assurance premiums. Such a table would assume 
uniform conditions all over the union. The enormously high 
value of p as computed above shows the fallacy of such a view. 

91. Secular and Periodical Fluctuations. — In the last para- 
graphs we have just learned how to detect the presence of dis- 
turbing influences in a statistical series. A value of the Lexian 
ratio differing from unity or a value of the Charlier coefficient of 
disturbancy differing from zero indicates the presence of fluc- 
tuations in the chances for the event or phenomena under in- 
vestigation. After having established the presence of such 
fluctuations it is the duty of the statistician to trace the sources 
of the disturbing influences. This is in general done by means of 
the theory of correlation, which will be discussed in the second 
volume of this work. 

It is, however, possible to classify the disturbances under two 
categories which by Charlier are termed as secular and periodical 
variations.^ The periodical fluctuations are in general difficult 
to discuss on account of the variations in the period of the dis- 
turbing forces. In many cases we are in absolute ignorance 
about the length of such a period and therefore unable to subject 
the series to a mathematical analysis. If the length of the period 
is known it is indeed not difficult to determine the periodical 
disturbances. This is often the case in series giving the occur- 
rence of a certain disease in various months. In statistics giving 
the frequency of malaria in a community, the observed cases are 

1 Lexis uses the terms "evolutionary" ("symptomatic") and "periodical" 
("oscilating") for such fluctuations. 

12 
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nearly all limited to the warmer months and infrequent in the 
winter months. 

In the secular fluctuations due to certain outward influences 
working continually in the same direction it is quite easy to 
calculate the rate of such variations. 

Let j3 denote the increase (decrease) of the original probabilities 
(pi, p2, p3, ■ ■ ■ pn) from set to set in the given statistical series 
so that 

P2 — Pi = |8 

P3 — P2 = |8 



Vn-1 = /3 



We then have: 



Pk = Px+ {k - l)i3. (1) 

The mean probability has the value: 

Pl + P2 + Pi + ■ ■ ■ + P^ 

Po - jr 

Pi + Pi+P + Pi+20+ ■■■ +pi+{X- l)/3 



N 



(2) 






Eliminating pi from (1) and (2) we have: 

Pk- po= yk - '-^-- j /3. 



If the observed and reduced numbers mi, mo, m^, • ■ • m^f may be 
regarded as approximately coinciding with sp\, sp2, sps, ■ ■ ■ spy 
we may write (2) as follows: 



Mk - M = (k ^ )sp ik= 1, 2, 3, • • N). 



(3) 



In order to obtain an expression for */3 in known quantities we 
must elimate the quantity k. ]\Iultiplying both sides of the 
equation (3) by k — {N -\- l)/2 we have: 
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Summing this expression for all values k from k = Itoh = N 
we have: 

^[Jc ^j(m,-ilf) = 5/3s(^fc-^^j . (4) 

The following expressions from the summation of series are 
well known to the reader from elementary algebra: 

f:k^=lNiN+l){2N+l), 

j:k=lN{N+l). 
1 - 

Substituting these values in (4) we obtain after a few simple 
trar; oiormations the following expression for 5/3: 

12 / iV + 1 \ 
'^= N{N^-1) ^V —Ji^'^-^- (5) 

Tn ', ECTJLAR Anntjai, Decrease op Nitmbbe of Stillbirths in Denmark. 
s = 70,000, N = 25, M = 1,735 



Year. 


*. 


rrik. 


mt- M. 


-^^ (• 


2 


-)(mi-Jlfj. 


1888 


1 


1,861 


+ 126 


-12 


— 


1,512 


89 


2 


1,924 


+ 189 


-11 


— 


2,079 


1890 


3 


1,830 


+ 105 


-10 


— 


1,C50 


91 


4 


1,779 


+ 44 


- 9 


— 


396 


92 


5 


1,811 


+ 76 


- 8 


— 


808 


93 


6 


1,788 


+ 53 


- 7 


— 


371 


94 


7 


1,719 


- 16 


- 6 


+ 


96 


1895 


8 


1,753 


+ 18 


- 5 


— 


90 


96 


9 


1,714 


- 21 


- 4 


+ 


84 


97 


10 


1,811 


+ 76 


- 3 


— 


228 


98 


11 


1,797 


+ 62 


- 2 


— 


124 


99 


12 


1,737 


+ 2 


- 1 


— 


2 


1900 


13 


1,696 


- 39 










01 


14 


1,732 


- 3 


+ 1 


— 


3 


02 


15 


1,694 


- 41 


+ 2 


— 


82 


03 


16 


1,685 


- 50 


+ 3 


— 


150 


04 


17 


1,682 


- 53 


+ 4 


— 


212 


1905 


18 


1,705 


- 30 


+ 5 


— 


150 


06 


19 


1,602 


-115 


+ 6 


— 


690 


07 


20 


1,723 


- 12 


+ 7 


— 


84 


08 


21 


1,694 


- 41 


+ 8 


— 


328 


09 


22 


1,665 


- 70 


+ 9 


— 


630 


1910 


23 


1,658 


- 77 


+10 


— 


770 


11 


24 


1,658 


- 77 


+ 11 


— 


847 


1912 


25 


1,638 


- 97 


+12 


— 


1,164 



Sum: -11,590 
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As an example illustrating secular fluctuations I take the 
previously discussed series of stillbirths in Denmark. 
We have in this case 

hence: 

s/3 = - 11,590 : 1,300 = - 8.92. 

From this we may draw the conclusion that the number of still- 
births in Denmark pr. 70,000 births per annum on the average 
is decreased by 8.92. 

If the fluctuations are of an essential secular character we may 
write 

m= M+ (k- 13) (- 8.92) 

as the number of stillbirths pr. annum. Apart from accidental 
fluctuations due to sampling we should therefore obtain a nearly 
normal series for the 25-year period if we calculated the number 
of stillbirths each year according to the expression: rrik 
— (k — 13) (— 8.92). Such a computation is given below: 

Number op Stillbirths in Denmark Freed prom Secular FLtrcTUATiONS. 

S.92). 







s = 


' 70,000, 


N = 25. 






Year. 


*:. mj- 


-(i-13)(- 


8.92). 


Year. 


k. mil- 


•(*-13)(- 


1888 


1 


1,754 




1900 


13 


1,696 


89 


2 


1,826 




01 


14 


1,741 


1890 


3 


1,741 




02 


15 


1,712 


91 


4 


1,699 




03 


16 


1,712 


92 


5 


1,740 




04 


17 


1,718 


93 


6 


1,726 




1905 


18 


1,730 


94 


7 


1,666 




06 


19 


1,675 


1895 


8 


1,708 




07 


20 


1,875 


96 


9 


1,678 




08 


21 


1,765 


97 


10 


1,784 




09 


22 


1,745 


98 


11 


1,779 




1910 


23 


1,747 


99 


12 


1,728 




11 
1912 


24 
25 


1,756 
1,745 



A computation of the characteristics of this series gives: 

M = 1,735, a- = 37.09, (Tb = 41.6, lOOp imaginary. 

The dispersion is now slightly subnormal and the coefficient 
of disturbancy is imaginary whereas in the original series in 
§ 88 it had a value equal to 3.4. 
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92. Cancer Statistics. — Mr. F. L. Hoffman in his treatise " Tlie 
MorTidity from Cancer Throughout the World" ci^■eJ some verj- 
interesting staiisTic* on mortality from cancer in ^-a^io^l^ locahties. 
Throui:h the kindness of Mr. Hoffman I am able to submit the 
follo\x-ing series relatir.j; to cancer among males iu the City of 
New York ^MimhaTtan and Bronx Boroughs : 

Deaths pbom CaXl ;:k •■: t'' ix tbk Cttt of Xe-w Yosk as REDrcEr to a 
SrArioxABT PoprLATiox of 1.000,000. 







s = 1.000.000, 


-V = 25. 


M = 560. 




Tear. 


Jt. 


m»> 


«»-.tf. 


-^?^- ( 


^- "-:-)<- 


1S-S9 


1 


377 


— Is3 


-12 


2.196 


ISiO 


'"> 


476 


— >4 


-11 


924 


91 


3 


410 


-15(.^ 


-10 


1.5O0 


V- 


4 


444 


-116 


— 9 


1.044 


93 


5 


4ti2 


— v»S 


— s 


7S4 


■H 


6 


423 


-137 


— 1 


959 


ls-i>o 


1 


442 


-lis 


- 6 


70S 


96 


S 


493 


— 67 


- 5 


3.%S 


u~ 


9 


505 


— 55 


- 4 


220 


US, 


10 


v"15 


- 45 


- 3 


135 


99 


11 


513 


— 47 


_ 2 


94 


1900 


12 


547 


- 13 


- 1 


13 


01 


13 


?i»o 


+ 35 








0-: 


14 


540 


- 20 


+ 1 


-20 


03 


15 


.5>0 


— 20 


+ 2 


40 


04 


16 


609 


-r 49 


-1 

— o 


147 


1905 


■ ~ 


639 


— "v 


+ 4 


316 


06 


IS 


61v> 


— 59 


— 5 


295 


o: 


19 


IV.S 


— - 9S 


— 6 


55.> 


OS 


■20 


631 


- 71 


-1- 7 


497 


09 


■"^l 


i^nS 


— 12-3 


^~ X 


9>4 


1910 


•>■■» 


no 


+150 


— i> 


1.350 


11 


23 


"10 


+150 


+10 


1.5tXi 


12 




721 


-161 


—11 


1 ~-\ 


1913 


■:o 


ris 


-15< 


-12 


:>f«.i 



= :< 27n 



A computation of the disp)er5ion and the Charlier coefficient 
of disturbancy gives a \"alue of lO0..-> in the neighborhood of IS, 
indicating marked fluctuations. An inspection of the series shows 
immediately that there is a markeii increase in the rate of death 
from cancer. Working out the secular disturbances in the ordi- 
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nary manner we find: 

„ 18,276 , , „„ 

^^^ = -1:300 = i4-o^ 

indicating an increase of death from cancer of about 14 persons 
pr. annum for a population of 1,000,000. Eliminating the secular 
disturbances in the same manner as above, we now get a coefficient 
of disturbancy equal to 0.983t {i = V — 1), practically a normal 
dispersion when taking into account the mean error due to 
sampling. 

93. Application of the Lexian Dispersion Theory in Actuarial 
Theory. Conclusion. — The Russian actuary, Jastremsky, has 
applied the Lexian Dispersion Theory in testing the influence of 
medical selection in life assurance.^ The research by Jastremsky 
evolves about the following question. Is medical selection a 
phenomena independent of the age of the assured? Let ^'^qx 
denote the observed rate of mortality after t years' duration of 
assurance. In the same manner qj^^^ denotes the rate of mor- 
tality of a life aged x after 5 or more years of duration (< ^ 5). 
Forming the ratio '"(/i : g^'^' for all ages of x we obtain a certain 
homograde series for which we may compute the Lexian Ratio 
and the Charlier Coefficient and thus determine if the fluctuations 
are due to sampling onlj or dependent on the age of the assured. 
Space does not allow us to give a detailed account of the very 
interesting research by Jastremsky as applied to the Austro- 
Hungarian Mortality Table (Vienna, 1909), and we shall limit 
ourselves to quote his final results as to the Lexian Ratio, L, 
for Whole Life Assurances and Endowment Assurances: 





Whole Life Assurances. 


Endowment Assurances 


t 


i 


L 


1 


0.88 


1.01 


2 


0.89 


0.96 


3 


1.12 


1.05 


4 


1.05 


0.98 


5 


1.07 


0.91 



The above values of L all lie close to unity and the series may 
therefore be considered as a Bernoullian Series where the fluctu- 

1 Jastremsky . "Der Auslese-KoefEzient," Zeilschr. f. d. ges. Vers.-Wiss., 
Band XII, 1912. 
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ations are due to sampling entirely. Or in other words, the ratio 
<Pt = '■'^Qx '■ ?i'^' is a quantity independent of the age of the 
assured. 

The great majority of statistical series may be subjected to a 
similar analysis as given in the preceding chapters. The char- 
acteristics as described previously, the Lexian Ratio and the 
coefficient of disturbancy, tell us the magnitude of possible fluc- 
tuations from sample to sample. In many cases we may by means 
of the secular coefficient of disturbancy, |8, partly or wholly 
eliminate such fluctuations, due to secular causes, and thus be in 
a better position to study the periodical fluctuations. 

A statistical research may be likened to the navigation of a 
difficult waterway, full of hidden rocks and skerries out of sight 
to the navigator. The amateur statistician, sailing the ocean 
in a blind and happy-go-lucky manner, often comes to grief on 
those rocks and suffers a total shipwreck. The skillful navigator, 
the mathematically trained statistician, is always on the lookout 
for the sea marks. In the Lexian Ratio and the Charlier Coef- 
ficient of Disturbancy he recognizes a beacon light, often signal- 
ling "Danger ahead." He stops his engines. In case he does 
not possess the particular charts giving the exact location of the 
hidden reefs his prudence advises him to call a pilot to bring his 
ship safely in harbor. On the other hand, if he has reliable 
charts and knows his profession thoroughly he may venture 
forth and do his own piloting, by a study of the charts. It is 
to the study of such charts — i. e., a special study of the higher 
statistical characteristics — that we shall turn our attention in 
the second part of this treatise. The reader who has followed us 
up to this point may perhaps feel discouraged by realizing how 
httle he has gained in knowledge after having learned a mass of 
technical detail and formulas. We can quite appreciate and 
understand this feeling. So far, he has perhaps chiefly been 
impressed by the treacherous and misleading character of sta- 
tistical mass phenomena, but to recognize a danger signal and 
thus avoid the pitfalls is one of the fundamental essentials in 
safe navigation in statistical research. 



PART II 

FREQUE^XY Cni\'ES -\XD 
HETEROGR.IDE STATISTICS 



CHAPTER XIII. 

THE THEORY OF ERRORS AXD FREQI'EXCY Cl'R\ES .A^'D ITS 
-\PPLICATIOX TO STATISTICAL SERIES HISTORICAL XOTES. 

94. General Remarks. The Hypothesis of Elementary Er- 
rors. — In the pie\'ious chapters we have discussed the elementarj* 
statistical parameters, the mean and the dispersion, together with 
the Lexian ratio and the Charlier coefficient of disturbancy and 
their application to the mathematical analysis of the homograde 
series. We shall now extend this discussion lo the parameters of 
higher ordei-s. stich as the skewness and the excess, and also give 
the theon" for a mathematical anahfis of the other great domain 
of statistical series, the heterograde series. 

The main reason for the separate treatment of the homograde 
staiistical series is on accotmt of their close analog\' to ordinars- 
mathematical probabilities. Whenever the inimber of comparison, 
s. may be rogavolcd as equivalent to the total number of equally 
possible cases in ordinar\- a priori probabiUties and the observed 
occurrences of the attribute v<?vent") as the favorable number of 
cases, wj, among the total ntimber of possible events. .>\ we are 
justified in reg-arding the ratio hi ; s in the light of a mathematical 
probability. For tliis reason all homograde series may be ex- 
plained as beir.i: subject to the same mathematical laws as those 
i:o^"crnini; onlinar\- a priori probabilities, which are ftilly explained 
by the series of BernouUi, Poisson ar.d Lexis and the various 
combinations of such series. Moreover, in all such series it is 
possible to compute both the mean and the liispei-sion by the 
indirect or combinatorial process instead of the direct or physical 
process. 

The nucleus of the three fimdament;il series, the Bernoulhan. 
the Poisson and the Lexian. as well as their various combinations 
is found in the development of the point binomial (p -f- g)^ of 
the Bernoullian Theorem as described in Chapter IX where the 
general term expressing the probability of the occurrence of an 
event E a times and of the complementary- event E. 3 = s — a 
times is i:i\en by the formula 



.(a) = (^)pV 



- a 
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The numerical computation of this exact expression becomes, 
however, too unwieldy for large values of s and we shall therefore 
try to replace it with a more flexible approximation, preferably 
by a continuous function or by a rapidly convergent infinite series. 
On page 101 we gave such an approximation formula for the 
maximum value of ^ (a), denoted by the symbol T„. We wish, 
however, to find a simpler expression for the more general term as 
well. This further development necessitates the determination of 
several higher characteristics or parameters than those expressed 
by the mean and the dispersion. If we should succeed in this task 
the homograde series can be fully explained by the theory of 
mathematical probabilities and placed upon a sound a priori 
basis. 

The question now arises whether the a priori theory of mathe- 
matical probabilities will furnish a similar basis for the second 
domain of statistics, the heterograde series. We are of course able 
to compute by means of the direct or physical process both the 
mean and the dispersion in various heterograde series, such as 
measurements on heights of adult males, number of fin rays in 
fishes or number of telephone calls over a trunk line in a given 
interval of time. But are we also able, like in the case of the 
homograde series, to forecast those parameters by means of the 
criterions of the Bernoullian, Poisson and Lexian series, i. e. by 
the indirect or combinatorial process? 

A simple consideration will soon lead us to the admission that 
no a priori reasoning or a simple theorem like that of the Ber- 
noullian will enable us to forecast the mean stature of Danish, 
Norwegian, Swedish or English adult males or the mean number of 
telephone calls over a trunk wire in a given interval of time. And 
while we by the physical or direct process can compute both the 
mean and the dispersion from previously collected statistical data, 
we have no way of knowing whether such parameters, purely 
empirical in form and nature, have any real significance beyond 
that of abstract mathematical calculations. Nor do such empirical 
parameters offer similar explanations as those of the homograde 
series. We are for instance able to predict the probability that in a 
series of 1000 successive drawings (with replacements) from a 
deck of whist the number of drawn aces will fall between 100 and 
120. But we are not able by an a priori reasoning or by mathe- 
matical deduction to forecast the probability that among 1000 
Scandinavian adult males — all chosen at random — the height 
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of an arbitrarily selected individual will fall between 170-175 
centimeters. 

Experience has, however, shown that the heterograde series 
show similar grouping tendencies around the mean value as those 
encountered in the homograde series. As an example we may 
compare the BernouUian series of black cards in sample sets of 10 
as collected by M. Charlier and shown on page 138 and the Poisson 
series of black balls in sample sets of 20 collected by Mr. Bonynge 
(shown on page 143) with a series of measurements, relating to the 
heights of Danish conscripts for the year 1916. Such a comparison 
is given below. 



Charlier 


■'s 




Data 




m 


X 


Fix) 





— 5 


3 


1 


— 4 


10 


2 


— 3 


43 


3 


— 2 


116 


4 


— 1 


221 


5 





247 


6 


+ 1 


202 


7 


+ 2 


115 


8 


+ 3 


34 


9 


+ 4 


9 


10 


+ 5 








Data 




m 


X 


Fix) 





— 5 


2 


1 


— 4 


9 


2 


— 3 


35 


3 


— 2 


52 


4 


— 1 


86 


5 





109 


6 


+ 1 


85 


7 


+ 2 


69 


8 


+ 3 


30 


9 


+ 4 


16 


10 


+ 5 


6 


11 


+ 6 


1 



Danish Anthropometric 


Data 




w 


X 


Fix) 


140-145 cm. 


— 5 


32 


146-150 " 


— 4 


44 


151-155 " 


— 3 


243 


156-160 " 


— 2 


1284 


161-165 " 


— 1 


3777 


166-170 " 





5742 


171-175 " 


+ 1 


4796 


176-180 " 


+ 2 


2129 


181-185 " 


-1-3 


588 


186-190 " 


-1-4 


81 


over 191 " 


4-5 


11 



1000 18727 

500 

The grouping tendency or the clustering around the mean value 
is manifest in all three series; but while this tendency in the case 
of the two homograde series as offered in the experiments by 
Charlier and Bonynge may be fully explained by means of the 
theorems of mathematical probabilities no such reasoning is 
sufficient to explain the clustering tendency of the heterograde 
series relating to Danish conscripts. The calculus of probabilities 
in itself would not be sufficient to explain the grouping tendency of 
the variates in a heterograde series unless a general hypothesis 
will aid us in explaining the variation among several heterograde 
objects in respect to a specific attribute. 

Thus the question which now confronts us is whether it is 
possible to establish a simple hypothesis which will enable us to 
extend the principles of the mathematical theory of probabilities 
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to the domain of the heterograde series and to build up a theory 
similar to that of the homograde series. The great Laplace was 
the first to solve this problem, and his investigations and analysis 
on this important subject are indeed some of the most important, 
but also some of the most difficult to follow in his Theorie des 
Probabilites. The hypothesis employed by Laplace in explaining 
the phenomena of variation in a heterograde series is the hypothesis 
of elementary errors. The hypothesis was later on somewhat 
simplified by the German astronomer and engineer, Hagen, and 
it has of late years been further developed through the elegant 
researches of the Scandinavian astronomer and statistician, 
M. Charlier. According to the Laplacean — Hagen — Charlier 
theory every variate or individual deviation from a certain norm is 
generated as the sum of a muss of small and unknown quantities — 
generally infinite in number — which are known as elementary errors 
{deviations) . The word error must of course be taken in a different 
sense than that we usually associate with the word. In precision 
measurements we are actually dealing with true or natural errors 
arising from imperfections of the instruments and the observer, 
but it would of course not be right to regard a deviation of say 
5 centimeters from the mean stature of a population group as an 
error in the usual sense of the word. Used in its wider sense as an 
expression for deviations the term will, however, be readily under- 
stood, and it is in this sense we shall use it in the following pages. 
Expressed in mathematical symbols the hypothesis of elemen- 
tary errors may be presented as follows. Let x^ (where k = 
1, 2, 3, . . s denotes the kth error source among a total of s 
sources) represent the magnitude of a statistical variate expressed 
as a deviation (error) from a certain norm, then 

fk (r) (fc = 1, 2, 3, s) 

may be regarded as the probability that xj, assumes the value r. 
As to this particular elementary error probability function Laplace 
makes no other assumptions than those which follow directly from 
the definition of a mathematical probability. That is to say 

0<fk{r)<\, 
where A; = 1, 2, 3, . . s and r = ± 1, ± 2, ± 3, . . ± => 

Since it is certain that one or more of the above values of r, 
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whether positive or negative, are bound to occur, we have evidently 
the relation 

r = +CO 

^hir) = l(fc = 1,2. 3, . . .s). 

r= — to 

The epoch-making analysis of Laplace lies in the determination 
of the unknown fimction /^-(r) from such simple and general 
assumptions. 

95. Application to Statistical Series. Definitions. — The La- 
placean-CharHer hj-pothesis of elementar\' errors opens the way 
for a mathematical analysis of a vast number of statistical data 
and series, which we shall brieflj' discuss in the following para- 
graph. First of all we submit therefore the following definition of a 
statistical object. 

A number of similar objects {a species) which can be arranged in 
numerical order according to the measurable variation of ^ certain 
observed attribute (character), also called a variate, is known as a 
statistical object, eventually as a statistical series. 

It is readily seen that this definition covers a wide range of sub- 
jects and that statistical methods instead of being applicable to 
social and economic problems only are equally useful in botam-, 
zoolog}', biologj- and even in astronomy, physics or chemistrs'. 
^Moreover, since the de%-iation of an indi^•idual variate of the 
statistical series as measured from a certain arbitrarily choosen 
norm e^"idently may be regarded as the sum of several elementary- 
errors (the word error to be taken in its wider sense) , it is e\ideiit 
that the statistical object can be subjected to a mathematical 
analj-sis on the basis of the theorj- of errors. 

A simple consideration will also con%'ince the reader that the 
above definition covers not alone the heterograde series but also 
the homograde series. For instance in the Bernoullian and Poisson 
series as presented in the experiments by CharHer and Bom-nge 
on page 170, the munber m, which gives the number of favorable 
events in each sample set. may be considered as a statistical 
variate and F(.r) as a statistical series. 

This simple fact is of the utmost importance since it makes 
it possible to treat both the homograde and heterograde 
series on the common basis of elementarj' errors and links 
in the case of the homograde series the a priori mathematical 
probabihties witli the a posteriori probabihties. Such connection 
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is of special interest in the further treatment of the celebrated 
Rule of Bayes. 

While thus the homograde and heterograde series may be 
viewed from a common viewpoint it is, however, necessary to 
point out a distinct difference in the nature of the statistical 
variates themselves. In one case we find the variate (the measura- 
ble attribute) expressed in whole numbers only, such as the number 
of fin rays in fishes, petals in flowers or the occurrence of a specified 
color in card drawings. The variates are in such cases known as 
integral variates. The observations on tail fin rays of flounders by 
the Danish biological station, on page 131 offer such an example. 
As a further illustration of integral variates we choose the follow- 
ing statistical series from the observations of the English phys- 
icists, Rutherford and Geiger. Messrs. Rutherford and Geiger 
counted the numbers of alpha particles radiated from a bar of 
polonium during a long series of intervals, each lasting one-eighth 
of a minute. The table states the number of times, F{x), the 
number of particles omitted in this interval had a given value, x. 



X 


Fix) 





57 


1 


203 


2 


383 


3 


525 


4 


532 


5 


408 


6 


273 


7 


139 


8 


45 


9 


27 


10 


10 


11 


4 


12 





13 


1 


14 


1 



As an example of a very slight variation the Danish biologist and 
botanist, W. Johannsen, quotes the following observations by 
his colleague Professor Raunkjoer of Copenhagen on the number of 
involucral leaves of 100 samples chosen at random of taraxacum 
erythrospernum: 

No. of Leaves Frequency 
X F(x) 

13 99 

14 1 
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In other cases it is not possible to express the measure of the 
attribute in whole numbers. Thus measurements of stature, 
chest circumference and weight of recruits, or measurements of the 
percentages of sterility in wheat, barley, rye and oats will in 
general possess all possible fractional values between two integral 
numbers. Hence we must group the observations in classes, and 
such classified variates are known as graduated variates. 

The measurements of heights of Danish conscripts for the 
year 1916 and shown on page 17(J offer an illustration of grad- 
uated variates. Another case is furnished in the number of deaths 
by attained ages in a mortality table. In most mortality tables 
the deaths are given by integral ages only and represent therefore 
strictly speaking integral variates. 

In biology we encounter numerous homograde series especially 
in investigations on dimorphism or pol3Tiiorphism. Johannsen of ' 
Copenhagen produced from crossbreeding between a species of 
beans with white blossoms and yellow seeds and a species with 
violet blossoms and black seeds a bastard species with violet 
blossoms and muddy colored seeds. The offsprings — 558 individ- 
ual plants — of this bastard showed following variations: 

White Blossoms Violet Blossoms 

160 398 

Color of Seeds Color of Seeds 

I I 

I I ,1 I 

yellow bronze violet black 

39 121 105 293 

In respect to blossoms we have two alternatives, in respect to 
seeds 4 alternatives. 

As a few illustrations of the wide range of variable phenomena 
which allow to be classified as statistical objects, we present the 
following table: 
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Petals of Flowers. 
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It is to be noted that in the primary observations on homograde 
series the numbers are all abstract, whereas the heterograde series 
consist of concrete numbers. Another peculiarity of the homo- 
grade series is that they are always connected with the number of 
comparison, s, which is absent in the heterograde series. 



96. Compound Frequency Curves. — According to the Laplacean-Charlier 
hypothesis any frequency curve may be considered as being generated as a 
sum of independent frequency curves and represents therefore in the final 
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instance really a compound frequency curve. Mathematically speaking we 
may therefore consider any frequency curve, no matter of what form, as 
being represented by the symbolic relation. 

Fix) = 2 Ni(pi(x) for i = 1, 2, 3, . . . 

The functions <Pi(x) may sometimes all be normal or Laplacean probability 
curves. On the other hand, by assigning different values to Ni or the areas 
of the separate curve we may obtain a compound curve of wavelike form. 
Suppose for instance we had two samples of observations on the heights of 
say 100,000 Japanese recruits and 10,000 Danish recruits, each individual's 
measure written on separate cards. Suppose furthermore that we mixed those 
110,000 cards in an urn, and then formed a new frequency distribution of this 
mixture This new frequency distribution would be a compound curve with 
two strongly pronounced crests or maximums. One (the Japanese) clustering 
around the value of 160 centimeters, while a smaller crest (the Danish) would 
tend to cluster around the value of 170 centimeters. 

Another instance is offered in the distribution of the frontal breadths of a 
Naples specimen of the crab, Carcinus rrwenas, as measured by Weldon. Wel- 
don thought it very probable that this rather skew frequency distribution 
was produced by a fusion of two distinct races or species of individuals, which 
were clustered symmetrically around separate means. The distinguished 
English biometrician, Karl Pearson, tested this hjrpothesis for him and 
analysed the compound curve as two component curves representing respec- 
tively 58.55 and 41.45 per cent of the total area of the compound frequency 
curve. Thus Weldon's hjrpothesis was verified by a mathematical analysis. 

A quite different type of example is offered in the frequency distribution of 
deaths by attained age as represented by the d^ column in any ordinary mor- 
tality table. The fact that the deaths in the d^ column showed a marked 
clustering tendency, strongly suggestive of the normal Laplacean curve around 
the age group 70-75, was already noted by Lexis, who in this way made a 
very interesting attempt to determine what he called a Normalaltek for 
the age of man. Later on Italian statisticians took up the problem and ana- 
lysed the dx curve of Italian life tables as a sum of several normal frequency 
curves. Karl Pearson was the third investigator to take up the problem in a 
very fascinating essay in his Chances of Death. Pearson pictures Death as 5 
marksmen shooting at a human target passing over the Bridge of Life. Each 
marksman aims with different precision and skewness. The result is 5 com- 
ponent skew curves. 

Although the brilliant and perfect literary style of the eminent English 
biometrician rouses the admiration and brings forth the reader's unstinted 
praise, I can, however, not help being in accord with the distinguished Amer- 
ican biologist and statistician, Raymond Pearl, who in his 1920 Lowell Lec- 
tures, although speaking in the highest terms of praise of Pearson's work, 
characterized a mathematical analysis of this kind as being nothing more than 
a highly interesting and neat graduation formula but wholly void of any 
biological significance. 

It is as a mere matter of fact a comparatively easy matter to break up any 
death curve of a mortality table into separate mathematical components. As 
an example of such a process I offer the illustration in Chapter XVI, where 



178 THEORY OF ERRORS AND FREQUENCY CURVES. [97 

I have broken up the recently published American AM^^'> mortality table 
into two curves of the Gram-Charlier type, obtaining as good results as the 
Italian investigators and Pearson, who use 5 component curves. 

But as already pointed out by Lexis "a mere mathematical analysis in 
component groups does not enlarge our knowledge of the causal relationships. 
It would be a quite different matter, however, if it were possible to establish 
clustering tendencies around definite ages for each of the more important 
causes of death." 

An attempt to do this has been made by the present writer in his forth- 
coming book An Elementary Treatise on Frequency Curves and their Applica^ 
tion to the Human Death Curve. I start with the hypothesis "that the frequency 
distribution of deaths at attained ages classified according to certain groups 
of causes of death among the survivors in a mortality table tend to cluster 
around specific ages in such a manner that their frequency distribution can 
be represented by a Gram-Charlier frequency curve." If this hypothesis can 
be accepted as having a sound biological basis I have shown that it is possible 
by a mathematical analysis resting on such hypothesis to construct mortality 
tables from mortuary records by sex, attained age and cause of death, and 
without any information about the number of lives exposed to risk at various ages. 
This proposal has been met by a storm of protest from many American ac- 
tuaries, who claim that I have attempted the impossible. Final judgment 
should be suspended, however, until the actual appearance of the work, which 
I think must be judged from a biological rather than from a mathematical 
point of view. The fact that the method has given good results in the con- 
struction of many mortality tables among highly different races and occupa- 
tions must, I think, be attributed to purely biological causes and not to ac- 
tuarial or mathematical methods, which in the process have been employed 
as a mere tool, as a means rather than as an end. 

97. Early Writers. — The idea of frequency curves or frequency 
distribution is probably very old. It very likely arose in the mind 
of man when he began to make quantitative observations. Un- 
doubtedly the surveyors and engineers of the people of ancient 
civilization had noticed that successive and independent measure- 
ments of the same object often showed variations. On the other 
hand we have no means of knowing if the ancient geometers and 
mathematicians knew how to estimate and value such variations 
from the true value of the object. It is probable that the great 
Greek astronomers, such as Hipparch and Aristarch in their 
astronomical observations have employed some rational method 
of allowing for errors due to the instruments and the individual 
observer, but no records are available so as to settle this question. 

The great Danish astronomer, Tycho Brahe, the father of mod- 
ern astronomy, on the other hand made careful adjustments for 
errors of observations and has left us records on the systematic 
method of such adjustments. 
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However, it was not before the close of the eighteenth century 
that the errors of observations were subjected to a mathematical 
treatment. The first known writer on the mathematics of the 
subject was the English actuary, Thomas Simpson, a most remark- 
able self-taught mathematician who in 1757 issued his "Miscel- 
laneous Tracts on some curious and very interesting subjects in 
Mechanics, Physical Astronomy and Speculative Mathematics." 
In this interesting and instructive little book is found a chapter 
entitled "An Attempt to shew the Advantage arising by Tak- 
ing the Means of a Number of Observations in Practical As- 
tronomy." 

About 15 years later the French mathematician, Lagrange, took 
up the ideas of Simpson in a memoir, which at the time caused 
considerable notice in mathematical circles. Lagrange in his 
treatment followed a course very much similar to that employed 
by de Moivre in the discussion of the problem which bears his own 
name. 

In 1778, Daniel Bernoulli in the scientific publications of the 
Russian acade-ny of Petrograd subjected the memoirs of Lagrange 
to a searching criticism and proposed the first mathematical 
formula for a frequency curve or curve of errors around the mean. 
Bernoulli suggested as a law of error or frequency function, <p (x), 
the following expression: 

(p(x) = -{- ■\/r'^ — x\ where r is a constant 

This ecjuation represents a symmetrical semi-circle and gives as 
we shall have occasion to show at a later stage a rough approxima- 
T,ion to the presumptive law of error. 

A very important contribution to the theory was also made 
by the American, Adrain, in his journal "The Analyst." 

98. Laplace and Gauss. — Laplace was the next mathematician 
to take up the subject of frequency curves in his monumental 
work "Theorie analytique des probabilites." The great French- 
naan dealt with the subject in a manner which leaves little to 
be desired. M. Charlier, the eminent Swedish astronomer and 
statistician, has justly remarked that among the various deductions 
of the law of errors, the exhaustive researches of Laplace occupy 
beyond doubt a leading position because of their generality and 
far reaching applications. On the other hand, the analysis of 
Laplace is by no means easy to follow in all its details and the 
4th chapter of the "Theorie Analytique des Probabilites" accord- 



180 THEORY OF ERRORS AND FREQUENCY CURVES. [ 98 

ing to a remark by Todhunter in his "History of Probabilities" 
forms one of the most important but at the same time also one of 
the most difficult parts of the great work. 

No doubt the extreme difficulty of fully mastering the far 
reaching but intricate analysis of Laplace was realized by the 
mathematicians. Already his friend and disciple, Poisson, realized 
this and issued in 1832 a note entitled "Sur la probability des 
resultats moyens des observations." But the wealth of ideas in 
Laplace's treatise and their wide range of application were really 
never fully recognized for almost a full century when they were 
taken up by Charlier, who more than any one else has proven 
their great worth as the most general and direct basis for a com- 
plete theory of frequency functions and the associated problem of 
correlation. 

In the meantime Laplace's method had been supplanted by the 
independent and contemporary researches of the great German 
mathematician, Gauss. The method employed by Gauss in 
deriving his law of error or frequency curve and the therewith 
associated criterions of the method of least squares is undoubtedly 
very simple and elegant and much easier to follow for the beginner 
than the analysis of Laplace. Gauss in his studies confided himself 
to the so-called precision errors or errors arriving from repeated 
measurements by means of physical instruments, such as astronom- 
ical or geodetic observations or measurements in experimental 
physics or chemistry. 

The ideas put forth by Gauss were followed up by a number of 
astronomers and physicists, such as Bessel, Encke, Hansen, and 
Hagen of Germany, Andrse, D'Arrest, and Gylden of Scandi- 
navia, Airy, Herschell, and Tait in England, Laurent in France, 
and Newcomb and Chauvenet in America. And the Gaussian 
methods are still used exclusively in preference to those introduced 
by Laplace in most of our text-books on theory of errors and the 
related subject of least squares. 

One reason for this preference for the theory of Gauss apart 
from its simplicity of representation is to be looked for in the fact 
that until a comparatively recent date the majority of applications 
of the theory of frequency curves or error curves had reference to 
precision measurements. As pointed out by N. R. Jorgensen,' in 
his excellent Danish treatise on "Frequency Surfaces and Correla- 
tion" it will, as a rule, be found that the Gaussian error law may be 

1 Underscigelser over Frequensflader og Korrelation (Copenhagen, 1916). 
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regarded as an excellent method of approximation, which becomes 
well-nigh perfect in the case of errors of precision measurements 
with delicate instruments in the hands of carefully trained ob- 
servers. The Gaussian frequency curve may therefore be said to 
fulfill all the requirements in praxis of a law of error, where we are 
concerned with errors in the true sense of the word. 

99. Quetelet's Studies. — Matters became, however, quite dififer- 
ent when the biologists and economists began to employ math- 
ematical analysis in their research work. It was the great Belgian 
astronomer and statistician, Quet«let, who first introduced exact 
measm-ements in the study of biological and anthropological 
phenomena and showed that a number of collected statistical data 
on heights, weights and chest measurements of recruits exhibited a 
close conformity to the Gaussian law of error, although the varia- 
tion among the individual objects as measured could not be con- 
sidered solely as errors in the original sense of the word. 

Investigations along this line were greatly accelerated by the 
discoveries of Quetelet. All sorts of measm-ements were taken and 
the rapidly growing collections of statistical data relating to 
economic and social conditions as recorded by various govern- 
mental statistical bureaus furnished material for further investiga- 
tions. But unfortimately in all these investigations the Gaussian 
error law came to act as a veritable Procrustean bed to which all 
possible measurements should be made to fit. The belief ia 
authority so typical of modern German learning and which had 
also spread to America was too great to question the supposed 
generality of the law discovered by the great Gauss. Statisticians 
could not conciliate themselves with the thought of the possible 
presence of "skew" frequency curves, although numerous data 
offered complete defiance to the Gaussian dogma and exhibited a 
markedly skew frequency distribution. Supposedly great author- 
ities argued naively that the reason the data did not fit the curve 
of Gauss was that the observations were not nimierous enough to 
eliminate the presence of skewness. In other words, skewness was 
regarded as a by-product of sampling and was believed could be 
made to disappear completely if we could take an infinite number 
of observations. 

Voices had, however, been raised against these energetic but 
futile Procrustean efforts. Already Quetelet realized the existence 
of skew frequency curves. This is clearly brought out in his 
correspondence on this subject with Mr. Bravais of the Ilcole 
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Polytechnique of Paris as published in an appendix to his "Lettres 
sur la Theorie des probabilit^s." 

100. Oppennan, Gram, and Thiele. — Neither Quetelet nor 
Bravais succeeded, however, in giving a complete mathematical 
treatment of the theory of skew frequency curves. The first com- 
plete mathematical demonstration of this aspect of the matter was 
given by various Scandinavian investigators. A Danish actuary, 
Opperman, was probably the leading spirit in organizing the 
revolt against the belief in authority as preached by the adherents 
of the doctrine of Gauss. Opperman, who was a self-taught 
mathematician, seems to have looked with suspicion on many of 
the researches by German mathematicians of the latter half part 
of the nineteenth century. He was a great admirer of the early 
Scotch and English mathematicians with whose works he was 
thoroughly familiar, and it is said he took great delight in pointing 
out how many of the lengthj- and formidable German demonstra- 
tions in the realm of the theory of functions had been demon- 
strated in a more elementary and clearer manner by such men as 
Wallis, Stirling, MacLaurin, Gregory, Briggs and Napier. As a 
practical actuary and managing director of the Danish Govern- 
ment Life Assurance Fund he had ample opportunity to notice 
that many frequency distributions occurring in actuarial work 
offered a notorious defiance to the frequency curve of Gauss. 
Around Opperman there gathered a small group of young en- 
thusiastic students of mathematics among whom we may espe- 
cially mention Gram and Thiele and to whom he expounded his 
ideas. Opperman himself wrote veiy little and always in a con- 
densed form. A re^dewer of his work remarks that he rewrote his 
essays several times so as to be able to represent on a single page 
what other mathematicians usually required a dozen of pages tc 
express. He has left very little material bearing on the theory of 
frequency curves, but his discussions on this subject with his 
younger disciples evidently bore fruit. 

J. P. Gram was the first mathematician to show that the normal 
symmetrical Gaussian error curve was but a special case of a more 
general system of skew frequency curves which could be repre- 
sented b}' a series. In his verj' original doctor'.s thesis in Danish 
on "The development of series hy means of the method of least 
squares" (Copenhagan, 1879) ^ he extended some theories orig- 
inally expounded by the Russian mathematician, Tchebychefi', to 
' Om Rakkeudviklinger. 
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the representation of frequency functions by means of a series. 
By using the Gaussian curve 

- {x - MY 
2 0-2 

as a generating function Gram showed that an arbitrary frequency 
function could be represented approximately by a series of the form 

F{X) = CoV (X) + Cl if' (X) + C2 <p" (X) + C3 ip'" (X) + C4 ip"" (x) 

+ . . . 

In this development Gram established some far reaching prop- 
erties of infinite determinants and their relations to orthogonal 
functions which later have become of much use in the recent epoch- 
making researches on integral equations by the Swedish math- 
ematician and actuary, Fredholm. 

To Gram, therefore, belongs the honor of having been the first 
mathematician to give a systematic theory for the development of 
skew frequency curves. 

While Gram's later work as a managing director of a life insur- 
ance company occupied most of his time and left but little oppor- 
tunity for purely mathematical work his friend, T. N. Thiele, began 
to lecture at the University of Copenhagen on the general theory of 
observations. The substance of these decidedly original lectures 
was in 1889 published in book form under the title "A General 
Theory of Observations." ^ 

In several respects this work occupies a dual position to the 
work of the great Laplace, although Thiele is set like flint against 
the idea of basing the theory of probabilities on the conception of 
an a priori probability. In his lectures he always maintained that 
the greatest benefit derived from the study of the method of least 
square was that the student learned where not to use it. Among 
one of the great achievments of Thiele is the introduction in the 
theory of frequency curves of a certain system of statistical char- 
acteristics to which he gave the name of semi invariants and which 
are practically identical to the system of moments later on intro- 
duced by Pearson. By means of these semi invariants Thiele 
arrived at the same series as deduced by Gram. In Thiele's work 
we also find a very original treatment of the theory of correlation 

^ Almindelig lagttagelseslaere. 
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originally introduced by Bravais. But instead of the term correla- 
tion he uses the words "bonded observations." 

Like Laplace's "Theorie des Analytiques des Probabilites " 
Thiele's original work and its subsequent abridged translation in 
English offers by no means an easy reading, especially to the 
beginner. It contains, however, like the work of Laplace a verita- 
ble wealth of ideas and methods which remain unsurpassed in the 
realm of mathematical statistics and no serious worker on the 
general theory of observations can afford to neglect to study the 
original works of Laplace and Thiele. 

101. Modem Investigations. — The investigations of Gram and 
Thiele bring us up to the close of the nineteenth century. Their 
ideas reached but a small number of students of mathematical 
statistics because of the very limited knowledge of Scandinavian 
languages among mathematical readers in general. But from the 
beginning of the nineties other voices began to be heard against 
the Gaussian dogma. In Germany it was Fechner who first 
entered the ranks of the opposition with his so-called "zweispa- 
Itiges Gesetz." His work was continued by Lipps and the Leipzig 
astronomer, Bruhns, who by the publication in 1906 of his " Kollek- 
tivmasslehre " gave an almost complete theory of frequency curves 
where we again find the series originally developed by Gram and 
Thiele. 

Although quite considerable valuable work has thus been done 
in Germany along the lines of frequency curves it was, however, 
in England that the renewal of the classical probability theory 
took place with the renowned memoirs by the English math- 
ematician, Karl Pearson, entitled "Contributions to the Math- 
ematical Theory of Evolution" in Philosophical Transactions for 
1895. 

Since that year Pearson has produced a certain type of statistical 
literature, of almost Gargantuan proportions. The quarterly 
journal "Biometrika" of which he is the editor is devoted to the 
mathematical study of biological problems. When Pearson first 
introduced his famous types of curves (now more than" 12 in num- 
ber) the study of frequency distributions was greatly accelerated. 
The application of these curves to biological problems was ap- 
parently so simple that they were used in a rather loose manner 
by many biologists and anthropologists who had but little training 
in mathematical analysis. Examples of such pseudo mathematical 
analysis are especially found in the writings of the American 



101 ] MODEHN IXVESTIGATIOXS. 185 

anthropologist, Franz Boas, which may be held up as a warning 
to all statisticians to keep away from the higher mathematical 
analysis of collected statistical data imless they are familiar with 
the tools of the probabihty calculus. 

Such misuse can of course not be laid at the door of Mr. Pearson 
who indeed has protested ^-igorously against the erroneous appli- 
cation of his methods by investigatoi-s of the Boas type. On the 
other hand, it is equally true that ^Ir. Pearson at times has rehed 
too much on his mathematical formulas and violated the maxim 
of the Danish biologist, Johannsen, that "we must practice 
biolog}' with mathematics and not as mathematics." 

The immense production of Pearson coupled with his well-nigh 
perfect and forceful st>'le of writing has to a certain extent over- 
shadowed the researches of his compatriot, Edgeworth, whose 
works according to the Danish actuarj' and mathematician, 
Jorgensen, are greatly superior to those of Pearson both in scien- 
tific rigor and in practical applications. Edgeworth has deduced 
the pre\"iously mentioned series by Gram and Thiele in a ver\' 
elegant form in the Cambridge Philosophical Transactions (1904) 
and he has in a series of articles in the Journal of the Royal Statis- 
tical Society outdistanced many of his contemporaries among 
the mathematical statisticians. Unfortunately Edgeworth's con- 
tributions have not gotten the attention thej- deserve, probablj^ 
because of the rather fragmentary and unsj'stematic manner in 
which they have appeared. Among the new methods introduced 
by Edgeworth we may particularly mention the so-called method 
of translation. 

The Pearsonian t\-pes of frequency curves are represented bj' 
formulas which in mathematical language are termed closed 
expressions in contradistinction to the development in series. This 
latter method is still being preferred by the Scandina\aan math- 
ematicians imder the leadership of the Swedish astronomer Charher. 
CharHer started his first investigation with a small brochure en- 
titled '■ Uber das Fehlergesetz " in the MeddelendanioT 1905, wherein 
he followed the method originally introduced by Laplace and in a 
most elegant way of deduction reached the series of Gram and 
Thiele. He has since that year pubhshed a series of small mono- 
graphs on various aspects of mathematical statistics and their 
appHcation to stellar statistics which beyond doubt are destined to 
become classics in the historj- of probabilities. 

Charher has shown that all frequency' curves fall into two t^-pes 
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which he has designated as type A and type B. The A type is the 
usual expansion of Gram and Thiele with the normal frequency 
curve as the generating function. Type B which covers decidedly 
skew frequency curves is given by the series 

Fix) = Co ^ (x) + |-j A ^ (x) + |J A^ ^ (x) + . . . 

( - 1)" 



X 
\ p \cosO} 



where 

g - A /• \cosU) 

<fj (x) =— J e cos [Xsmoj — xcojdco 



is the generating function. 

The decidedly constructive work begun by Charlier has been 
ably supplemented by his talented disciple Wicksell and the 
Danish actuary, Jorgensen. Wicksell in 1920 issued in Swedish a 
series of lectures on mathematical statistics delivered during the 
autumn of 1919 before the Swedish Assurance Society. He has 
also written numerous excellent monographs on mathematical 
statistics and their application to vital statistics. 

N. R. Jorgensen issued in 1916 his large octavo volume on 
"Researches on Frequency Surfaces and Correlation" ^ which 
beyond doubt is the most important work among the contributions 
of Danish actuaries since the appearance of the memoirs of Gram 
and Thiele. Jorgensen's systematic treatise has greatly furthered 
the studies of the Scandinavian school both in theoretical and 
practical aspects. A very important feature of his book is the 
insertion of an extensive collection of numerical tables of various 
functions which greatly facilitates the practical applications of the 
theory. These tables, many of which are the results of his individ- 
ual efforts, hold equal rank with the well-known "Tables for 
Biometricians and Statisticians" edited by Karl Pearson in 1914. 

Besides the writings of Charlier, Wicksell and Jorgensen a 
number of Scandinavian mathematicians, actuaries and statis- 
ticians have contributed valuable researches both on frequency 
curves and correlation methods. We may especially mention such 
men as Guldberg, Gyllemberg, Malmquist, Burrau and Lundquist. 
In this group we might also include the Danish biologist, Johann- 

^ Undersogelser over Frequensflader og Korrelation. 
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sen, whose writings on the theory of heredity are recognized as 
standard texts on the appUcation of the mathematical statistical 
methods to problems dealing with inherited characteristics in 
organic life. 

A very interesting attempt to develop a theory of frequency 
curves has been made by the Dutch astronomer, Kapteyn, in his 
"Skew Frequency Curves in Biology and Statistics" (Groningen, 
1912). Kapteyn's theory which has much in common with 
Edgeworth's method of translation introduces a new idea in the 
generation of a frequency curve by making the size of the individ- 
ual object depend not alone upon the sources influencing a collec- 
tion of such individuals but also upon the size of the object at a 
previously given time t. This idea of introducing the time factor 
in the theory of probabilities is, however, more justly credited to 
the French mathematician, Bachelier, whose large treatise on 
probabilities of which the first volume appeared a few years ago 
has introduced some new thoughts regarding the conception of 
continuous probabilities which are bound to strongly influence 
the whole theory. 

Before closing this necessarily brief and incomplete historical 
note we wish to mention the close connection of the theory of 
frequency curves with that of integral equations. Since the 
appearance of the epoch-making memoirs by Fredholm the theory 
of integral equations has occupied a central position in math- 
ematical analysis. This youngest branch of higher analysis has 
already found numerous practical applications in physics and 
chemistry and possesses equally important properties in the way of 
solving numerous statistical problems. In fact, the whole theory of 
frequency curves and correlations can be reduced to the solution 
of a few integral equations whose constants contain all the char- 
acteristic properties of the frequency distribution. On the basis 
of this principle, a complete theory of frequency curves could be 
presented on a single book page. 



CHAPTER XIV. 

THE MATHEMATICAL THEORY OF FREQUENCY CURVES. 

102. Frequency Distributions. — If N successive observations 
originating from the same essential circumstances or the same 
source of causes are made in respect to a certain statistical variate, 
X, and if the individual observations Oi{i = 1, 2, 3, .... N) are per- 
muted in their natural order in accordance with their magnitude 
then this particular permutation is said to form a frequency dis- 
tribution of X and is denoted by the symbol F(x) . 

The relative frequencies of this specific permutation, that is the 
ratio which each absolute frequency or group of frequencies bear 
to the total number of observations, is called a relative frequency 
function or probability function and is denoted by the symbol <pix). 

If the statistical variate is continuous or a graduated variate, 
such as heights of soldiers, ages at death of assured lives, physical 
and astronomical precision measurements, etc., then 

dz(p{z) 

is the probability that the variate x satisfies the following relation 

z— -dz<x<z+-dz 

or that X falls between the above limits. 

If the statistical variate assumes integral (discrete) values only 
as for instance the number of alpha particles discharged from cer- 
tain radioactive metals and gases, such as polonium arid helium, 
number of fin rays in fishes, or number of flower petals in plants, 
then (p{z) is the probability that x assumes the value z. From the 
above definitions it follows directly that 

(a) F(z) = N(p{z) (Integral variates) 

(b) dzF{z) = Nip{z)dz (Integrated variates) 

Interpreting the above results graphically we find that (a) will 
be represented by a series of disconnected or discrete points while 
(b) will be represented by a continuous curve. 

As to the function <p{z) we make for the present no other 
assumptions than those following immediately from the customary 

188 
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definition of a mathematical probability. That is to say the 
function ;p{z) must be real and positive. Moreover, it must also 
satisfy the relation 

+ 00 

J ip{z)dz = 1, 



or in the case discrete variates: 

fiz) = 1, 



2 



which is but the mathematical way of expressing the simple 
hypothetical disjunctive judgment that the variate is sure to 
assume some one or several values in the interval from — = to 
+ 0° . The zero point may be arbitrarily chosen and need not 
coincide with the natural zero of the number scale. Thus for 
instance if we in the case of Danish recruits choose the zero point 
of the frequency curve at 170 centimeters an observation of 
180 centimeters would be recorded as + 10 and an observation of 
160 centimeters as — 10. 

103. Parameters considered as Symmetric Functions. — In 
regard to a frequency function we may assume a priori that it will 
depend only upon the variate x and certain mathematical rela- 
tions into which this variate enters with a number of constants 
Xi, A2, Xs, X4 . . ., symbolically expressed by the notation 

F{x, Xi, X2, X3, X 4 . . .) 

where the X's are the constants and x the variate. 

All these constants or parameters are naturally independent of 
X and represent some peculiar properties or characteristic essen- 
tials of the frequency function as expressed in the original observa- 
tions Oi {i = 1, 2, 3, . . .N). We may, therefore, say that each 
constant or statistical parameter entering into the final math- 
ematical form for the frequency function is a function of the 
observations o^. This fact may be expressed in the following 
symbolic form 

Xi = Si (oi, 02, 03, . . . Oat) 

X2 = S2 (Oi, O2, O3, . . . Om) 



Xat = Sn (oi, 02, 03, . . . On) 
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But from purely a priori considerations we are able to tell some- 
thing else about the function >Si . (^ = 1, 2, 3, ... A^^). It is only 
when permuting the various o's in an ascending magnitude accord- 
ing to the natural number scale that we obtain a frequency func- 
tion. This arrangement itself has, however, no influence upon 
any one of the o's which were generated before this purely arbitrary 
permutation took place. The ultimate and previously measured 
effects of the causes as reflected in each individual niunerical 
observation, Oj, depend only upon the origin of causes which form 
the fundamental basis for the statistical object tmder investigation 
and do not depend upon the order in which the individual o's occur 
in the series of observations. 

Suppose for instance that the observations occurred in the 
following order 

Ol, 02, 03, . . . On 

By permuting these elements in their natural order we obtain the 
frequency distribution F(x). But the very same distribution 
could have been obtained if the observations had occurred in any 
other order as for instance 

07, O9, Oiv, . . . O3 . . . Ol. 

so long as all of the individual o's were retained in the original 
records. Or to take a concrete example as the study of the number 
of policyholders according to attained ages in a life assurance 
office. We write the age of each individual policyholder on a small 
card. When all the ages have been written on individual cards 
they may be permuted according to attained age and the resulting 
series is a frequency function of the age x. We may now mix these 
cards just as we mix ordinary playing cards in a game of whist, 
and we get another permutation in general different from the 
order in which we originally recorded the ages on the cards. But 
this new permutation can equally well be used to produce the 
frequency function if we are only sure to retain all the cards and 
do not add any new cards. 

The various functions S(oi, 0?, 03, . . . On) are, therefore, sym- 
metric functions, that is functions which are left unaltered by 
arbitrarily permuting the TV elements 0, and no interchange what- 
ever of the values of the various o's in those symmetric functions 
can have any influence upon the final form of the frequency func- 
tion or frequency curve, F{x) . 
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We now introduce under the name of power sums a certain well- 
known form of fundamental symmetrical functions defined by 
the following relations 

50 = 0? + o» + o» + . . . 0^ = AT 

51 = o\ + o\ -\- o\ -\- . . . 0^ = 1,o\ 
s^=ol + ol + ol + ...o§,= ^O'i 



Moreover, a well-known theorem in elementary algebra tells us that 
every symmetric function may be expressed as a function of 

Sl, S2, S3, . . . SjSf. 

From this theorem it follows a fortiori that we are able to express 
the constants X in the frequency curve as functions of the power 
sums of the observations. While such a procedure is possible, 
theoretically at least, we should, however, in most cases find it a 
very tedious and laborious task in actual practice. It, therefore, 
remains to be seen whether it is possible to transform these sym- 
metrical functions of the power sums of the observations into some 
other symmetrical functions, which are more flexible and workable 
in practical computations and which can be expressed ia terms of 
the various values of s. 

104. Semi-Invariants of Thiele. — ^It is the great achievement 
of Thiele to have been the first mathematician to realize this 
possibility and make such a transformation by introducing into 
the theory of frequency curves a peculiar system of symmetrical 
functions which he called semi-invariants and denoted by the 
symbols Xi, X2, X3 . . . 

Starting with the power sums, Sj, Thiele defines these by the 
following identity 

SoeLi !? 13 =s„-|-_ + _-f._-+... (1) 

which is supposed identical in respect to w. 

Since Sj = So' the right hand side of the equation may also 
be written as e"'" + e"*' + e"'" + ...= Se°i". 
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Differentiating (1) with respect to w we have 



XlW , XjOJ^ , A30J' 



Soe 



ilr: _L ' '"^ _L "■'"' _L, I- \ \ 2 

U~^|2"^[3~^'''-. _, A2a; A3C0 






Multiplying out and equating the various coefficients of equal 
powers of co we finally have:^ — 

Si = XiSo 

52 = XiSi + X2S0 

53 = X1S2 + 2 X2Si + X3S0 

54 = X1S3 + 3 X2S2 + 3 X3S1 + X4S0 



where the coefficients follow the law of the binomial theorem. 
Solving for X we have 

Xi = Si : So 

X2 = (S2S0 - Si) : So 

X3 = (S3S0 - 3S2S1S0 + 2s?) : So 

X4 = (S4S0 — 4S3S1S0 — 3siso + 12s2sfso — 6st) : sq 



The semi-invariants X in respect to an arbitrary origin and 
unit are defined by the relation 

Soe- ' - '" =e'""+e''"+e''="+ .... 

where Oi, 02, 03 . . . are the individual observations. 

Let us now change to another coordinate system with another 
unit and origin defined by the following linear transformation 

o'i = aOi+ c 

The semi-invariants in this new system are given by the relation 

Xi'o? Xz'a)^ Xs'w^ . 

Soe - - - "" = e'"'"+e°^'"+.... 

_ g(aoi + c)w 1 (002 + 0)0)1 
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Since the various values of X' do not depend upon the quantity 
CO we may without changing the value of the semi-invariants re- 
place CO by CO : a in the above equations which give 

X CO X M' X.«o3 01 w u 

Soe~ - - = e + e -\- e +.... 



) I 1 cto Xia> , Xsa 

- o.u 02U oau — 75" +72 

Le +e +e +..J = e"soe- 



= e 

Taking the logarithms on both sides of the equation we have 

X/w X2'co^ A/co^ _ ceo Xiw X2Co^ Xsco^ 

Differentiating successively with respect to co we have 
Xi' Xz'co Xs'co^ ^ I ^ I ^2" XsCO^ 

X2' , Xs'co , X/co^ ^ '■'' 



a 



X3' X/co \ I \ / I 

-^ H r + . . . = X3 + X4 CO + . . . 



Letting co = we therefore have 



— !- = — + Xi, or Xi' = aXi + c 
a a 

-| = X2, or X2' = 0^X2 



-i = X3, or X3' = a^Xs 



from which we deduce the following relations 

Xi(aa; + c) = aXi(x) + c 

\{ax + c) = a''Xr(x) f or ?• > 1 

We shall for the present leave the semi-invariants and only ask 
the reader to bear in mind the above relations between X and s, 
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of which we shall later on make use in determining the constants 
in the frequency curve <p{x). 

Before discussing the generation of the total frequency curve it 
will, however, be necessary to demonstrate some auxiliary math- 
ematical formulae from the theory of definite integrals and integral 
equations which will be of use ia the following discussion as 
mathematical tools with which to attack the collected statistical 
data or the numerical observations. 

105. The Fourier Integral Equation. — One of these tools is 
found in the celebrated integral theorem of Fourier, which was 
the first integral equation to be successfully treated. We shall in 
the following demonstration adhere to the elegant and simple 
solution by M. Charlier. Charlier in his proof supposes that a 
function, i^(co), is defined through the following convergent series. 

F( CO) = a [/(o) + /( 0)6""' + /(2 a)e^""' + . . . 



or /^(oj) = a y- ^ /(aTO)e° 



(2) 



where 



We then see by a well known theorem of Cauchy that the 
integral 

+ CO 

(3) 



/(w) = f f{x)e'"''dx '■ 



is finite and convergent. If we now let ma = x and let a = 
as a limiting value, a becomes equal to dx and /(am) =/(x). 
Consequently we may write 

Urn F{w) = I (a). 
a = o 

Multiplying (2) by e~''""'daj and integratiag between the 
limits — it/ a and + tt / a we get on the left an expression of the 

+ic/a 

form I F{w)e~^''^d(a and on the right a sum of definite integrals 

-It/a 

of which, however, all but the term containing /(r a) as a factor 
will vanish. This particular term reduces to 

' See Goursat: Mathematical Analysis (English Translation, New York), 
page 364. 
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—X a 

al/raio.' or 27rj\ra) 

Hence we have 

— X s 

— X G 

B\' loTTiiis: a oonverire toward zero and bv the substitution 
ra = X this equation reduces to 






We then have, if we introduce a new fvmction c\^'^ defined by 
the ji-.up.e relation: 

\ 2tv.(^- = limfv'i^ or 



1 '/' 

1 r 



(,o;i^ 



(5b) 



Chariier has sjCiresTtxi the i.a:ne conjugated Fourier f:r.4.-::o', of 
j\j for the exprc^siorj ^ ^• 

The <:\rj;j:ioiis 5a^ and 5b ;vre known ijs inic;-,-:^ tquatk:< of 
the ,'. ■^: kind. The rxpre?cfion f~ (or t """'') is known ;ss the 
HM ■.<;.-:; of :he equation. If in ^^ob) we know the value of ^ v^' we 
:v.o sb'.e :o vie:e:nii:'.e /,z Inverse'.}", if we know ;\j'» we may 
Gnd ^ .-■ from o;i\ 

106. Frequency Function as the Solution of an Integral Equa- 
tion. — ^We are now in a ix>si:io::i lo make use of the se^.l:-i:•.^^l:•i;^uts 
of Tliie'.e. which hitherto in our disoussiou have appeared ;is a 
rather diseonneoted and alien member. On prj^re 191 we s^iw 
that the se:ri-ir.v;v. :a:.:s could be expressed by the relraion 

.1- 2- J- ■■■ ,- 

e- - =-e 

when o,-(i = 1.2.3.... denoti^ the individual observations. 
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The definition of the semi-invariants does not necessitate that 
all the o's must be different. If some of the o's are exactly alike it 
is self-evident that the term e"*" must be repeated as often as 
Oi occurs among all of the observations. If therefore N(p(oi) 
denotes the absolute frequency of Oj where <p(oi) is the relative 
frequency function, then the definition of the semi-invariants 
may be written as: — 

Xi 1X2 ,1X3 , , 

For continuous variates, x, the above sums are transformed into 
definite integrals of the form 



X, , X2 ,, Xi ,, +<; +°? 

— CO H £0'-4- — tO"+ . . . 

|1 ^12 13 ^ 

e - 



-(-ao ~r^*' 

/ (p(x)dx = I (p(x)e''"dx 



Let us now substitute the quantity -\/-lw, or ico, for w in the 
above identity. We then have : — 



X, . , X, ., , , X, ., , , +"" +^ 



J(p{x)dx = J (p(x)e'''"dx 



under the supposition that this transformation holds in the com- 
plex region in which the function is defined. 

In this equation the definite integrals are of special importance. 

The factor / (p{x)dx is, of course, equal to unity according to 

— GO 

the simple considerations set forth on page 189. The integral 
on the right hand side of the equation is, however, apart from the 
constant factor \/27r nothing more than the ^(co) function in the 
conjugate Fourier function if we let <p{x) = f(x), and 

e~ - - = -v/27r(^(w) 

According to (5b) we may, therefore write f{x) or <p{x) as 

' ^ Xi . , X2 .„ „ . X3 ., , . 
1 P rr *"+ n> » " + .•^»'w'+ • • ■ 

(p{x) = H~ / ^ e d(j3 

.— CO 

as the most general form of the frequency function (p{x) expressed by 
means of semi-invariants. (See Appendix.) 
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107. The Normal or Laplacean Probability Function. — The 

exactness with which <p{x) is reproduced depends, of course, upon 
the number of X's we decide to consider in the above formula. 
As a first approximation we may omit all X's above the order 2 
or all terms in the exponent with indices higher than 2. Bearing in 
mind that i^ = — 1 we therefore have as a first approximation 

, , 1 / iui(\i -x) - -p^ 0)2 , 

JiX>J 

— CD 

This definite integral was first evaluated by Laplace by means of 
the following elegant analysis. Using the well known Eulerean 
relation for complex quantities the above integral may be written as 

-|-co -|-oo 

e 2 cos,[{\x — x)(xi\do}-\-i I e "^^ sva[{\\ — x)(i3\d(i} 

— 00 — 03 

The imaginary number vanishes because the factor e ^ 
is an even function and sin [(Xi — a;)co] an uneven function, and 
the area from — 0= to will therefore equal the area from to 
+ <" , but be opposite in sign, which reduces the total area from 
— ':o to + "= or the integral in question to zero. 

In regard to the first term, similar conditions hold except that 
cos [(Xi — x)w] is an even function and the integral may hence 
be written as 

+00 



-/ 



e 2 cos (rco)c?c<j where r = Xi 
6 



Regarding the parameter r as a variable and differentiating / in 
respect to this variable we have 

+00 

— = ^— I (— X2Coe 2 j sin (^^) ^t^ 
dr M^ 

From this we have by partial integration: — 

+00 



^ = Afe-^"^ 



dr \i 



sm (rco)aa) - v- / e ^ ggs {rcojdo} 
° 



^ rl 1 dl r 

= — — or 7j- = -t- 
X2 / dr X2 
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From which we find 

log / = - -^ + log A 

■^ A2 

where log A is a constant. Hence we have: — 

I = Ae^>^- 
In order to determine 4 we let r = and we have 

00 

-X. 



Io = A 





This finally gives the expression for (Po{x) in the following form: 

_ (X,-x) ' 

1 2K, 

as a preliminary approximation for the frequency curve (p{x). 

The first mathematical deduction of this approximate expression 
for a frequency curve is found in the monumental work by Laplace 
on ProbabiUties, and the function (Po{x) entering in the expres- 
sion (po{x)dx, which gives the probability that the variate will fall 
between x — \dx and x + \dx, is therefore known as the Lapla- 
cean probability function or sometimes as the Normal Frequency 
Curve of Laplace. The same curve was, as we have mentioned, 
also deduced independently by Gauss in connection with his studies 
on the distribution of accidental errors in precision measurements. 

Laplace's probabihty function, <Pa(x), possesses some remark- 
able properties which it might be well worth while to consider. 
Introducing a slightly different system of notation by writing 
\i = M and \/X2 = o", cpoix) reduces to the following form: 

2 - (i - Jlf)2:2a-2 

which is the form introduced by Pearson. 

The frequency curve, (Poix), is here expressed in reference to a 
Cartesian coordinate system with origin at the zero point of the 
natural number system and whose unit of measurement is also 
equivalent to the natural number unit. It is, however, not neces- 
sary to use this system in preference to any other system. In fact, 
we may choose arbitrarily any other origin and any other unit 
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standard without altering the properties of the curve. Suppose, 
therefore, that we take M as the origin and o" as the unit of the 
system. The frequency function then reduces to 

where 2 = (x — M) : a 

Since the integral of (po{z) from — c° to + <= equals unity 
the following equation must necessarily hold. 

+00 



J e~''"-^dz = V2ir 



This latter result maj% however, be deduced independently of 
the fact that (Po{~) happens to be a probability function. The 
above definite integral is a form well known from the calculus and 
equals \/"-7r. It serves therefore as an independent check of our 
calculations. 

108. Hermite's Polynomials. — The Laplacean Probability Curve 
possesses, however, some other remarkable properties which are of 
great use in expanding a function in a series. Starting with <p„{z) 
we may by repeated differentiation obtain its various derivatives. 
Denoting such derivatives by ^1(2), <P2{z), <Pi{z) . . . respectively 
we have the following relations.^ 

(po{z) = e 

<piiz) = - Zifoiz) 

cpiiz) = {z- - l)(po{z) 

(pi{z) = - {z^ — Zz)(po{z) 

iPi{z) = (2" -6s=+ 3)^0(2) 



and in general for the nth derivative: — 

<Pn\z) = (- 1) 2 2 ' 

n{n - 1) (n- 2) (n -3)2""* 



n{-n -!)(«- 2) («. - 3) {n - 4) {,n - 5)2" " ^ 
2-4 6 



2 -4 

+ • ■ <Po(2) 



' In the following eomputations we have omitted temporarily the constant 
factor 1 : -\'^27r of ^oU) and its derivatives. 
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It can be readily seen that the derivatives of (Po{z) are repre- 
sented throughout as products of polynomials of z and the func- 
tion <po{z) itself. The various polynomials 

H,{z) = 1 
HM = z 
Hiiz) =3^-1 
H^iz) = (z^ - 32) 
Hi{z) = (z^ - 6z^ + 3) 

and so forth are generally known as Hermite's polynomials from 
the name of the French mathematician, Hermite, who first intro- 
duced these polynomials in mathematical analysis. 

The following relations can be shown to exist 

H„ + iiz) + zH^iz) + nH„ _ 1(2) = 
and 

dm,Xz) zdHM 



dz'^ dz 



nHniz) = 



from which we successively may compute the various H{z). 

A numerical 10 decimal place tabulation of the first six Hermite 
polynomials for values of 2 up to 4 and progressing by intervals 
of 0.01 is given by Jorgensen in his aforementioned "Frekvens- 
flader og Korrelation." 

109. Orthogonal Functions. — There exist now some very 
important relations between the Hermite polynomials and the 
derivatives of (foiz), or between Hn{z) and <Pn{z). 

Consider for the moment the two following series of functions 

<Po{z), <pi{z), cpiiz), tpsiz) (Pi{z), . . . 
Ho{z), HM, H2{z), H,{z) H,{z), . . . 

where (Pn{z) = (— 1)" Hn{z) <po(z) and where lim <Pniz) = for 3 = ±00 

We shall now prove that the two series <Pniz) and Hn{z) form 
a biorthogonal system in the interval — <= to + ot> , that is to say 
that they are 

(1) real and continuous in the whole plane 

(2) no one of them is identically zero in the plane 

(3) every pair of them, (p„{z) and HJ^z) satisfy the relation 

J <Pn{z)H^{z)dz = (n^m) 
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We have the self-evident relation 

4-00 4-03 4-ao 

J Hrn{z)iPn{z)dz = fHmiz)H,iz)<Po(z)dz = fH^{z)<p^{z)dz 

— OO — 00 —00 

Since this relation holds for all values of m and n it is only neces- 
sary to prove the proposition for n> m. For if it holds for n > m 
it will according to the above relation also hold for n <m. 
By partial integration we have: — 

-f-co 4-co 4-00 

J H^{z)<p^{z)dz = ri?Jz)^„_i(2)l— jH'^{z)^„^^{z)dz 

when H'm{z) is the first derivative of Hm(z). 

The first member on the right reduces to since ^„ _ i (z) =0 
f or 3 = ± <K and because H^ is of a lower order than (p^. We 
have therefore: — • 

4-OD -f-CO 

j HJz)(Pn{z)dz = - J H'm{z)(Pn - l{z)dz 

CO 00 

-f-oo -f-oo 

J H'^iz) (pn-iiz)dz= - J H'„(z) (Pn - 2{z)dz 

CO — OO 

4-00 4-00 

/ H"r^{z)(Pr,-2iz)dZ = - J HZ{z)<Pn-ziz)dz 



Continuing this process we obtain finally an expression of the form 

4-00 +0° 

f Hr,{z)Vn{z)dZ = (- ir^'fHj'^''''<Pn-m^liz)dz 
_oo 

where hJ'^'^^^ (z) is the m+1 derivative of H„ (z) and 
„ _ TO _ 1 >. 0. Since H^iz) is a polynomial in the m^ degree 
its m + 1st derivative is zero, and we have finally that 

4-0O 

fH^iz)^niz)dz = 
for all values of m and n where n ^ m. 
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For m = n we proceed in exactly the same manner, but stop 
at the m— integration. We have, therefore, by replacing m by n 
in the above partial integrations 

-|-a> -|-oo 

J H,{z)ip,,{z)dz = (-1)" jH„^^\z)ip,_^{z)dz = 

— 00 — CO 

+ 00 

(~ir fH„^''\z)<Po(z)dz 

— 00 

The n— derivative of ^n(^) is however nothing but a constant and 
equal to i^. Hence we have finally 

-|-00 CO + 

jHr,{z)ipMdz = (- 1)"^^ fe-"-'dz = (- l)"h\/2^ 

— CD — OO 

The above analysis thus proves that the functions Hm{z) and 
<p„{z) are biorthogonal to each other for all values of n different 
from m throughout the whole plane. 

110. The Frequency Function expressed as a Series. — We 
can now make use of these relations between the infinite set of 
biorthogonal functions UJyZ) and ^„(3) in solving the problem 
of expanding an arbitrary function ip{z) in a series of the form 

(p{z) = CoiPoiz) + Ci^i(2) + C2(pi{z) + . . . , 

the series to hold in the interval from — <» to + o= . 

If we know that ipiz) can be developed into a series of this 
form, which after multiplication by any continuous function can 
be integrated term for term, then we are able to give a formal 
determination of the coefficients c. 

This formal determination of any one of the c's, say Ci, consists 
in multiplying the above series by Ui{z) and integrating each 
term from — oo to + °° • All the terms except the one containing 
the product Hi(z)(pi{z) vanish and we have for Cj. 

ip{z)Hi{z)dz J <p{z)Hi{z)dz 

— CO — CO 

J <pi{z)Hi{z)dz (- 1)'KV2^ 

— 00 

It will be noted that this purely formal calculation of the co- 
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efficients c,- is very similar to the determiaation of the constants in a 
Fourier Series, where as a matter of fact the system of functions 

cos z, cos 2 2, cos Z z, . . . 
sin z, sin 2 z, sin 3 s, . . . 

is biorthogonal in the interval ^ 2 ^ 1. 

But the reader must not forget that the above representation is 
only a formal one, and we do not know if it is valid. To prove its 
validity we must first show that the series is convergent and 
secondly that it actually represents ip{z) for all values of z. 

This is by no means a simple task and it cannot be done by 
elementary methods. A Russian mathematician, Vera Myller- 
Lebedeff, has, however, given an elegant solution by means of some 
well-known theorems from the Fredholm integral equations. She 
has among other things proved the following criterion : — ■ 

"Every function ipiz) which together with its first two deriva- 
tives is finite and continuous in the interval from — ■== to + °° 
and which vanishes together with its derivatives for z = ± <» 
can be developed into an infinite series of the form: — 



^(2) =^Cie-''--^Hlz) 



where Hiiz) is the Hermite polynomial of order i." 

111. Derivation of Gram's Series. — It is, however, not our 
intention to follow up this treatment which is outside the scope of 
an elementary treatise like this and shall in its place give an 
approximate representation of the frequency function, <p(2), by 
a method, which in many respects is similar to that introduced 
by the Danish actuary Gram in his epoch-making work "Udvik- 
lingsraekker," which contains the first known systematic develop- 
ment of a skew frequency function. Gram's problem in a some- 
what modified form may briefly be stated as follows: — Being 
given an arbitrary relative frequency function, (p{z), continuous and 
finite in the interval — • c» to + <" (and which vanishes for 2 = ± °= ) 
to determine the constant coefficients c<„ ci, C2, C3 . . . in such a way 
that the series 

Co<Po(z) I CKPijz) , Ci iMz)^ , c„<p„(z) 

V<Po{z) \/<poiz) -Vfoiz) s/iPoiz) 



= v^FirS^'-^^^^) 
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gives the best approximation to the quantity <p{z) : \^<po{z) in the 
sense of the method of hast squares. That is to say we wish to 
determine the constants c in such a manner that the sum of the 
squares of the differences between the function and the approx- 
imate series becomes a minimum. This means that the expression 

must be a minimum. 

On the basis of this condition we have 



where the imknown coefficients c must be so determined that 



+ 0. 



IS a mmimum 



Taking the partial derivatives with respect to Cj we have 






-|-CO -[-00 



Now since 

-j-co -j-oo 



J \u{z)Jdz = J I c/[h„(2)] + Ci2|^ffi(3)J 

+ ...C,? [h„(2)]' I 'Po{z)dz 

we get 

-|-00 +CO 
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where the latter integral equals / <Pi{z)Hi(z)dz = (— iy\i'\/2Tr 



Equating to zero and solving for Cj we finally obtain the follow- 
ing value for c< 



( - D* 

Ci = 



-l-to 

j^^ J ip{z)Hi{z)dz for t = 1, 2, 3, . . . 



This solution is gotten by the introduction of -s/ (po{z) which 
serves to make all terms of the form Ci(pi{z) : y/ (po{z) equal to 
\^<Po(z)ciHi{z) {i = 1, 2, 3, ... n) orthogonal to each other in the 
interval ~ °= to + <" . 

In all the above expansions of a frequency series we have used 

the expression (po(z) = e as the generating function (see 

footnote on page 199), while as a matter of fact the true value of 

(Po(z) is given by the equation <po{z) = e : y/^-K 

The definite integral on page (202) 

+ 00 -t-CD 

(—1)" fHi{z)<Pi{z)dz ^\iJe-''--^dz = iV2^ 



will therefore have to be divided by \/27r, and the value of the 
general coefficient c,- will henceforth be reduced to 

+ 0O 

f <p{z)Hi(z)dz 

— CD 

Ci = 



(— 1)1* 

where Hi{z) is the Hermite. polynomial of order i defined by the 
relation 

Hi{z) = z + 274 2 

- ^•(^• - 1) (i - 2) {i - 3) (i - 4) (t - 5)2' ~ ^ " 

2.4.6 "•"••• 
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On this basis we obtain the following values for the first four 
coefficients : — 

Co = / <p{z)dz = 1 

— CO 



ci= i-iy j <p{z)zdz : li 

— CD 

C2= i-~irf{z'-l)<p{z)dz:\2 

— 00 , 
+ CO 

C3 = {--iyfiz^ — 3z)<p{z)dz:\^ 

— CO 

+ C0 

C4 = (— l)'j{z* — 6z^ + 3z),p{z)dz : If 



112. Absolute Frequencies. — While the above development of 
an arbitrary frequency distribution has reference to (piz), or the 
relative frequency function, it is, however, equally well adapted 
to the representation of absolute frequencies as expressed by the 
function, F{z). If N is the total number of individual observa- 
tions, or in other words the area of the frequency curve, we evi- 
dently have 

Fiz) = N(p(z) or J F(z)dz = N f (piz)dz = N. 

— 00 — CO 

Since iV is a constant quantity we may, therefore, write the 
expansion of F{z) as follows : 

F{z) = n\ c„(po{z) + civi(z) -I- Ci<pi{z) + . . . 

= N^c,H,{z)e-"--^ 
where the coefficients c,- have the value 

+ CO 



N\ 



^ f F{z)Hi{z)dz, for i= 1,2,3, ... 
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and where 

AT = jF{z)dz. 

00 

Since all the Hermite functions are polynomials in 2: it can be 
readily seen that the coefficients c may be expressed as functions of 
the power sums or of the previously mentioned symmetrical func- 
tions s, where 

+00 



J /F{z)dz 



These particular integrals originally introduced by Thiele in the 
development of the semi-invariants have been called by Pearson 
the "moments" of the frequency function, F{z), and s^ is called 
the rth moment of the variate z with respect to an arbitrary origin. 
It can be readily seen that the moment of order zero or So is 

-|-oo -1-00 

s„ = j'z''F{z)dz = N j(p{z)dz = N 

— CO — 03 

Hence we have for the first coefficient Co 

-f-00 -\-(X> 



•„ = fF{z)dz : fF{z)dz = 1 



^00 



We are, however, in a position to further simplify the expres- 
sion for F{z). 

As already mentioned we are at liberty to choose arbitrarily both 
the origin and the unit of the Cartesian coordinate system for the 
frequency curve without changing the properties of this curve. 
Now by making a proper choice of this Cartesian system of refer- 
ence we can make the coefficients Ci and Cj vanish. In order to 
obtain this object the origin of the system must be so chosen that 

-f-00 4-00 

ci =^fzF{z)dz :j Fiz)dz = 

— — CO —00 

This means that the semi-invariant si : So = Xi must vanish. 
It can be readily seen that the above expression for \i is nothing 
more than the usual form for the mean value of a series of variates. 
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Moreover, we know that the algebraic sum (or in the case of con- 
tinuous variates, the integral) of the variates around the mean 
value is always equal to zero. Hence by writing for z the expres- 
sion {z — M) when M equals the mean value or Xi, we can always 
make Ci vanish. 

To attain our second object of making Ci vanish we must choose 
the unit of the coordinate system in such a way that the expression 

~ - J Fiz)H2(z)dz : J F{z)dz = 



Ci = 



which implies that 

-|-CO -^00 -|.oo 

[ J F{z)zHz -J F{z)dz\ : J F(z)dz = 

"——CO — OO — ' — 00 

or that S2 : si — 1 = 0, or when expressed in terms of the semi- 
invariants that 

X2 = (S2S0 — si^) : So^ = 1. 

But by choosing the mean as the origin of the system the term 
si : So is equal to and we have therefore X2 = c^ = S2 : So = 1. 
Hence, by selecting as the unit of our coordinate system V'X2 
or a, where a is technically known as the dispersion or standard 
deviation of the series of variates, we can make the second coeffi- 
cient C2 vanish. 
In respect to the coefficients C3 and C4 we have now 

4-00 -|-00 -|-00 

(- 1)^^ 



C3 



3 



Di z'F{z)dz - 3 J zF{z)dz^ : J F{z)dz 
— 00 —CO —I — CD 



1 S3 , . 
which reduces to — tk — . while 

1^ s„ 



-)-oo +°^ +CD -\-ca 

i="-|4 I J z'F{z)dz~<dJ z^F{z)dz+3j Fiz)dz\ J F{z)dz, 

— *— — 00 —00 CO -J — 00 



which reduces to 

1 ^Si 6S; 



]_ [si 6s2 I 3so1 L rs4 q1 

While the coefficients of higher order may be determined with 
equal ease it will in general be found that the majority of mod- 
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erately skew frequency distributions can be expressed by means 
of the first 4 parameters or coefficients. 

113. Coefficients expressed by Semi-Invaxiants. — We shall now 
show how the same results for the values of the coefficients may be 
obtained from the definition of the semi-invariants. Since we 
have proven that a frequency function, F{z), may be expressed 
by the series 

F{z) = ^Ci<pdz) 

we may from the definition of the semi-invariants write down the 
following identity: — 

Xico , X^o)2 -f-oo 

IT "IT r 

SoC - NJ e "" {Co(Po{z) + CKPiiz) + Oi(pi.{z) -\- . ..dz 

— oo 

where 'N is the area of the frequency curve. 

The general term on the right hand side of the equation will be 
of the form 

4- CO 

CrJ e"^(pr(z)dz 



— OD 



where the integral may be evaluated by partial integration as 
follows: — 

+ CD +<" 4-°° 

je^ipXz)dz = e'^^r - iW 1 — o:je'''tpr^i{z)dz, 
and where the first term on the right vanishes leaving 



fe'^'cpMdz = (— o:yfe"'<Pr-i(z)dz 

— 00 —00 

Continuing in the same manner we obtain by successive integra- 
tions 

4-00 4-<= 

(— uyfe ^<Pr- i{z)dz = (— oiYJe ^"(p, _ 2{z)dz 

— 00 — t" 

4-0) 4-<» 

(— oiYje"'iPr-2{z)dz = (— CO)' J e'^<pr-z{z)dz 
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from which we finally obtain the relation 

+ 00 4.C0 +CO 

I e ^(Pr{z)dz = ( — coy I e <Po{z)dz = ,—- j e dz 

— 03 —00 * —00 

This latter integral may be written as 
V7r2 J V2 



. C -|(2-co)2 (-0,)-- ^ ^ 



TV 

Consequently the relation between the semi-invariants and the 
frequency function may be written as follows: — 

SoB " = IS \ CoCico + Coco^ — CsOJ" + . . . le " , or 






= N CoCico + C2C0^ — CsOJ^ + . . . e ^ , I 

= iV (Co — CiCO + dOi^ — CsOj' + . . . . 



By successive differentiation with respect to w and by equating 
the coefficients of equal powers of w we get in a manner similar to 
that shown on page 192 the following results: 

Co = So : iV = So : So = 1 
ci = — Xi 



Ci 



Ci 



Ci 



=^|'x3+3(X2-l)Xi = Xi=*] 

= g r X4 + 4X3X1 + 3(X2 — l)^ + 6(X2 — l)Xi^+ Xi^l 



If we now again choose the origin at Xi or let Xi = and 
choose •v/X2 = 1 as the unit of our coordinate system we have: — 

-1 - 1 

Co = 1, Ci = 0, C2 = 0, C3 = |g A3, Ci = 1^ A4 

114. Change of Origin and Unit. — The theoretical develop- 
ment of the above formulae explicitly assumes that the variate, z, 
Ls measured in terms of the dispersion or y/X^iz) and with Xi(s) 
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as the origin of the coordinate system. In practice the observa- 
tions or statistical data are, however, invariabty expressed with 
reference to an arbitraiilj^ chosen origin (in the majority of cases 
the natural zero of the number scale) and expressed in terras of 
standard units, such as centimeters, grams, years, integral num- 
bers, etc. 

Let us denote the general ^•ariate in such arbitrarily selected 
systems of reference by .r. Our problem then consists in trans- 
forming the various semi-in-\-ariants, Xi(.r), A:(.r), X3(.r), Xtix), 
... to the system of reference with \i[z) as its origin and -\ '\i\.z) 
as its unit. Such a transformation may always be brought about 
by means of the linear substitution 

z = a.v + b 

which in a purely geometrical sense implies both a change of origin 
and unit. On page 193 we proved the following general prop- 
erties of the semi-invariants 

\,{z) = Mc-^+b) = aM.r) + b 
\(z] = \(ax+b) = a'M-r) 

Let us now write \i(.r) = M and \;(.r) = a'-, we then have the 
following relations: — 

Xi(2) = aM+b 
Xou) = a'-(j- 

Since the coordinate s>-.-^tein of reference must be chosen in such 
a manner that Xii.^) = and \ 'X:(^) = 1 we have 

aM -1-6 = 
acr =1 

1 — 1/ 
from which we obtain a =— and b = , wliich brings z on the 

form z = {X — .1/) : (T. while ip^z) becomes 

1 ~(x- M)-■.2(r- 

^ \ 2ircr 

Moreover, we have X^z) = Xr(.r) : a'' for all values of r greater 
than 2. We are now able to epitomize the computations of the 
semi-invariants under the following simple rules: 
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(1) Compute Xi(x) in respect to an arbitrary origin. The 
numerical value of this parameter with opposite sign is the origin 
of the frequency curve. 

(2) Compute Xiix) for all values of r equal or greater than 2. 
The numerical values of those parameters divided with {^\(,x)),' 
or (T', for r=2, 3, 4, ... are the semi-invariants of the frequency 
curve. 

Remarks on Nomenclature and Tables. — We shall now briefly dis- 
cuss some of the geometrical properties of the Laplaoean probability curve 
^o(z)=e~^^'2 and its derivatives, <p»(z) =fl'»(3)<Po(z), for i=3, 4, 5 . . 
Writing ^o(z) and its derivatives as: 

(po(z)=e—^--2:y'2Tr 

<Pi{z) = -z(Pa{z) 

^2(z) = (-l)2(z2-l)<Po(z) 

^3(Z)= (-1)3(23 -3Z)^0(Z) 
'P4(z) = ( -1)4(04 -6z2+3)^0(z) 



we readily-Botice ihat both <Po(z) and all its derivatives of even order are 
even functions of z while all the derivatives of uneven order are uneven 
functions. 

The Laplacean probability function which occurs as a factor in all the 
expressions is in itself a single valued positive function with a maximum 
point at z = and a point of inflection at z = ±l and approaches the ab- 
scissa axis asymptotically in both positive and negative direction. At 
z =0 we have <Po(z) = l:^27r =0.3989. At plus or minus one, ^o(z) is less 
than 0.25 at z = ±2, ^o(z) is nearly 0.05, at plus or minus 3 about 0.004 
and at z = ±4 only 0.0001. 

In regard to the third derivative, <P3(z) =fl'3(z)<Po(z)]weflnd that it pos- 
sesses a maximum or minimum in the neighborhood of z = -|-0.7 and z = 
—0.7 respectively, it crosses the abscissa axis in the neighborhood of the 
points z = ±1.75 and approaches the abscissa asymptotically in both posi- 
tive and negative direction. 

The f oittth derivative has a major maximum point at z = 0, it crosses the 
abscissa axis from positive to negative direction in the neighborhood of 
z = +0.75, attains a minimum at about z = it 1.35, it crosses again the 
abscissa (this time from negative to positive direction) in the neighbor- 
hood of z = ±2.3, attains a secondary or minor maximum around z = ±2.86 
and begins then to decline until it ultimately approaches the abscissa axis 
asymptotically. 

These geometrical properties of the Laplacean frequency curve and its 
derivatives are, however, much more readily vizualized in the accom- 
panying diagram which needs no further explanation. We wish, however, 
to call the attention of the reader to the wavelike form of the various 
curves, which is strongly reminiscent of the form of functions encountered 
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ia hannoiiie analysis or in the expansions in Fourier Series, an analogy 
whicli we had occasion to mention in the discussion of the orthogonal 
properties of the Hermite polynomials and the derivatives of the Lapla- 
cean function. 
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In order to facilitate practical numerical calculations it is, however, 
necessary to have an e.xtensive set of numerical tables for ^0(2) and its 
derivatives. This fact was already noted bj- Laplace who more than 25 
years prior to the publication of the memoirs by Gauss on the normal 
error cun^e advocated the construction of a table of numerical values of 
the integral. 

1 /*z 

V 27rJ 



The first set of such tables was constructed by the astronomer, Kramp 

and modified forms of these tables are found in nearly aU treatises on least 

squares and standard texts on probabilities. The most recent set of 

•This fact, as pointed out by Pearson, deflnit^lj- establishes Laplace's priority of 
discovery of the probability curve. 
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tables of this integral are those of Sheppard in (Tables for Biometricians, 
edited by Karl Pearson) where the variate z is expressed in units of c, 
or the dispersion. Sheppard has also computed a table of the numerical 
values of ^Poiz). 

In order to use the Gram — Charlier expansion in serial form, it is, how- 
ever, necessary to compute tables for the derivatives — up to the fourth 
order. A brief table of the first 6 derivatives is already found in Thiele's 
earlier treatises. Charlier was, however, the first to supply an extensive 
set of tables to 4 decimal places for values of z up to 4 and progressing 
by intervals of 0.01 in his Researches on the Theory of Probabililies in the 
Meddelande for 1904. The most detailed tables are those of Jorgensen 
in his Frekrensflader og Korrelation which gives the values of <Po(_z) audits 
first 6 derivatives to 7 decimal places for values of z up to 4 and progress- 
ing by intervals, of 0.01. f The German astronomer Bruhns has in his 
Kollektivmasslehre given a set of tables to 4 decimal places of the values 
of the definite integrals 



/ 



'Pi(z)dz fori= 0, 1, 2, 3, 4, 5 



The Gram-Charlier series gives us the frequency function in the form 
F(.z) = 'Zci^i{z) where the various coefficients a are expressed as moments 
or semi-invariants. As we have already pointed out the derivatives of 
uneven order are uneven functions and the derivatives of even order are 
even functions. The addition of such terms as 031^3(2), 05^5(2), . . .tends 
therefore to produce asymmetry or skewness from the normal form, while 
addition of the terms C41P4, ceVe, ■ does not alter the symmetrical form 
but tends to make it topheavy or flatten it around the neighborhood of the 
origin or mean value of the variate 2. The coefficient £3:8! (or X3: 3!) is 
technically known as the skewtiess, and C4:4! (or X4;4!) as the excess of the 
curve. No particular names have as yet been proposed for the semi- 
invariants of higher order. 



tl intend to publish a similar set of .'J decimal place tables in the second volume of 
this treatise. 



PART III 

PRACTICAL APPLICATIONS OF 
THE THEORY 



Note: — In the following pages the factorial [^ = 1-2-3-- reis replaced 
by the symbol re! and the exponent J4n^ in the exponential expression e^+J^"* 
must be interpreted as re^:2 and not as l:2n''. 



CHAPTER XV. 

THE NUMERICAL DETERMINATION OF THE PARAMETERS 

115. General Remarks. — The previous investigations on 
frequency functions have all been more or less of a piirely 
theoretical nature. In the present chapter we now propose 
to show how the parameters are determined in actual practice 
from the individual observations or statistical summaries 

The determination of these unknown co-efficients or para- 
meters can — as emphasized by J0rgensen in his Frekvensflader 
og Korrelation — be looked at from two points of view. We 
may either consider the series as infinite in which case the ques- 
tion of determining the co-efficients becomes a problem in the 
Theory of Functions; or we may decide to consider a finite 
number of terms in the series and determine the coefficients so 
that the sum of the squares of the deviations of the resulting 
function from the observed statistical data becomes a minimum 
in the sense of the method of least squares. In this case ^he 
coefficients and not the moments or semi-invariants are repre- 
sentative of the observations. This latter method is the classi- 
cal method as used by Gram in his fundamental research on 
the expansion of frequency functions in series. A, biiief state- 
ment of the essential differences of the two methods may, how- 
ever, be of advantage to the reader. 

The method of moments requires that the areas of the defi- 

nite integrals of the form I x''F{z)ax must equal the areas 

of the observations which are expressed as power sums of the 

form 

1= » 
T,z'F{x) 

X= — 00 

while the method of least squares requires that 



/ 



+ 00 

\F{x)-Y.Ci<Pi{x)\Hx 
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must equal a minimum but does not necessarily impose any re- 
strictions as to the condition of equality of the observed and the 
computed areas as derived from the mathematical formula. 

The problem of determining the parameters in the sense of 
the method of least squares is therefore essentially a simple 
problem in maximum and minimum and is not necessarily — 
as some critics have imagined it to be — invariably interwoven 
with the law of errors as expressed by the Laplacean probability 
curve. It is, of course, true that the law of errors can be proved 
by regarding the principles of the method of least squares as 
an axiom, and inversely by accepting the law of errors as an 
axiom, i. e. by assuming the deviations from the observations 
and the functional law or mathematically determined frequency 
curve as being due to chance or random sampling, we may 
prove that the sum of the squares of such deviations actually 
becomes a minimum. This peculiar relation is, however, not 
necessary or required when we view the determination of the 
constants as a simple problem in maxima and minima. 

116. Remarks on Certain Criticisms. — Many English and American 
actuaries have of late shown a tendency to ignore the method of least 
squares and prefer to rely entirely upon the method of moments. Thus 
Palin Elderton in his otherwise useful and instructive work orxFrequeiicy 
Curves and Correlation states that the method is of little practical use, 
while ]Mr. D. Caradog Jones in his newly published First Course in Statis- 
tics claims the method of least squares "which is the traditional way of 
approaching all such problems, is shown to be impracticable in a large 
number of cases, either because the resulting equations cannot be solved, 
or, when they are capable of solution, because the labour involved would 
be colossal." This objection falls, however, to the ground in the case 
of the expansion of a frequency function in serial form because the un- 
known parameters, with the exception of the origin (the mean) and the 
unit (the dispersion) of the co-ordinate system, all appear as coefficients 
in true linear equations and hence are eminently adaptable to the treat- 
ment by least squares. 

The attitude of these writers is probably due to the fact that they work 
exclusively with the Pearsonian type of frequency curves where the 
function, F(z), is given as a closed expression rather than as an expansion 
in serial form. In nearly all Pearson's curve types there appear not more 
than four constants which in a measure accounts for the often successful 
application of the method of moments, although several of the examples 
presented by Mr. Jones in his book can scarcely be said to be recommenda- 
tive to Pearson's theory. On the other hand, it is a great drawback, not 
being able to have more than four constants at our disposal. Personally 
I have encountered a large number of statistical series where the Pear- 
sonian theory fails. This same fact is also noted by Jorgensen who on 
page 39 of his Frekuensjlader og Korrelation states that "jeg kender flere 
lagttagelsesraekker, hvor Pearson's Teori svigter totalt." 

In the purely theoretical development it matters but little whether we 
use moments or least squares in the expansion of a frequency function in 
a series; a fact which is readily seen from our previous demonstrations. 
In the purely practical work we have, however, this fact to consider. 



116] REMARKS ON CRITICISMS 217 

that the method of moments works exclusively with areas expressed as 
definite integrals, which are often difficult to determine in extremely 
skew distributions. And it is only by successive approximations that we 
in this manner reach a plausible result. Moreover, unless the observa- 
tions are very numerous, it is almost hopeless to compute the moments 
of higher order than the fourth, because of the very large errors arising 
from random sampling. Charlier in one of his monographs asserts that 
it is generally useless to compute moments of higher order than the 
second when the number of individual observations in the statistical 
series is less than 1000. Thiele gives the following brief rules: 

For the first and second semi-invariants rely exclusively on the observed 
data. 

For semi-invariants of higher order than 6 rely exclusively on theoretical 
considerations. 

For intermediate semi-invariants (between the 2d and 6th) rely partly 
upon theory and partly upon the observations. 

Caradog Jones, on the other hand, lustily ventures forth with moments 
of the fourth order, based upon 241, and in some instances even as low as 
180 individual observations. It is, therefore, no wonder that some of his 
results exhibit a somewhat poor "fit" with the original data. Another 
criticism which may be lodged against the method of moments as used 
by some adherents of the Pearsonian school, rather than by Pearson him- 
self, is that it works with unweighted observations, and the values of the 
extremities of the frequency curves are given the same weights as the 
more numerous observations in the immediate neighborhood of the mean. 

A second objection, raised among others by Elderton, is that the ex- 
pansion in serial form sometim.es gives rise to negative frequencies at the 
extreme tail ends of the curve, due of course to the fact that we have used 
a limited number of terms of the series. From purely practical con- 
siderations this objection counts little, because the observations at the 
extremeties are very few in number. It matters, for instance, but little 
in ordinary calculations of assurance premiums whether the upper limit 
of a mortality table is at 90 or at 100, and when Pearson from his curves 
actually has attempted to put an upper limit to the duration of human 
life, he has, to borrow an expression from the Danish biologist, Johannsen, 
begun to handle biology as mathematics and not with mathematics. In 
this connection it may also be noted that the Pearson Type I curve gives 
imaginary values beyond certain limits. When now certain followers of 
the Pearsonian school have considered this as an advantage and tried to 
interprete the limits as possbile values of repeated or presumptive observa- 
tions, it seems that such disciples have stretched their point a bit too far. 
It is not possible to see why negative results should be less plausible than 
imaginary results. Every student of ordinary algebra knows that the 
"imaginary" quantities are just as valid as the so-called "real" quantities, 
and it is probably the choice of this unhappy and ill-chosen nomenclature 
which has given rise to the above extravagant claims of some of the fol- 
lowers of Pearson. 

Finally some English and American actuaries have objected to the 
arbitrary choice of the parameters in the Gram or Charlier expansions. 
Unless I have completely misunderstood Mr. Elderton this is one of his 
his chief criticisms against Charlier's method. With my best intentions 
I cannot agree to this and will even go so far as to say that Mr. Elderton's 
criticism really speaks in favor of the methods put forth by the Scandi- 
navian scholars. As we have repeatedly emphasized in the preceding 
paragraphs, the arbitrary choice of ci and C2 amounts mathematically to 
the choice of an arbitrary origin and unit in the Cartesian co-ordinate 
system to which sm-ely no mathematician will make objections. Neither 
can objections be raised from the point of view of common sense. We 
might as well object to the meter as a unit of measure in preference to the 
yard, or to reckoning the solar time from the Greenwich meridian instead 
of the meridian of Paris. 
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The failure of the method of moments to compute with any degree of 
accuracy moments of higher order in the case of the majority of ordinary 
observations is probably the reason why some actuaries, especially in 
America, have maintained that the Gram or Charlier A type of fre- 
quency curves is not powerful enough to represent more than moderately 
skew frequency distributions. 

In spite of the incontrovertible fact that the most recent researches in 
the theory of integral equations have demonstrated beyond doubt that 
any frequency curve can be developed in convergent series by Hermite 
polynomials in conjunction with the normal Laplacean frequency curve 
an American actuary, Mr. Merwyn Davis, has taken the "bull by the 
horns" so to speak and boldly gone on record with the positive statement 
that "the Charlier series fails completely in oases of appreciable skewness." 
With all due respect for this young matador who has so boldly entered 
the ring to challenge the work of some of the most eminent mathemati- 
cians in the realm of integral equations I feel, however, that if Mr. Davis 
has actually succeeded in "throwing the bull" it is only in the sense as 
implied in the colloquial slang of his native America. In fact, we shall 
presently in some of our examples take up the challenge of Mr. Davis 
and show that the series he so curtly rejects can — by means of a simple 
transformation — be used on decidedly skew frequency distributions with 
even greater success than the Pearsonian curve types. 

With these preliminary remarks we shall now proceed to 
give several examples of the application of the Gram or Lapla- 
cean — Charlier frequency series, employing either the method 
of moments or the method of least squares in the numerical 
determination of the constants, although preference will be 
given to the latter method in cases of appreciable skewness or 
excess. 

It is, however, not our intention to go into details of the 
method of least squares and its relation to error laws, except 
in its connection with the problem of maximum and minimum. 
Any number of standard treatises are now available on the sub- 
ject, however, to which we may refer interested readers.* 

117. Charlier's Scheme of Computations. — The general 
formulae for the semi-invariants were given on page (192). 
In practical work it is, however, of importance to proceed along 
systematic lines and to furnish an automatic check for the cor- 
rectness of the computations. Several systems facilitating such 
work have been proposed by various writers but the most 
simple and elegant is probably the one proposed by M. Charlier 
and which is shown in detail with the necessary control checks 
on the following pages. Charlier employs moments, while we 
in the following demonstration shall prefer the use of the semi- 
invariants. 

* A particularly attractive presentation in English is found in David Brunt's Com- 
bination of Observations (Cambridge, 1918). 
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If we define the power sums of the relative frequencies <p{x) 

x'F(z)dx: j F{x)dx for r = 0,l,2,3 .... 

00 OO 

we find that the expressions for the semi-invariants as given on 
page (192) may be written as follows: 

/I3 = ma — Smzmi +2mi^ 

/I4 = m4 — 4m3mi — 3m2^ + 12m2mi^ — 6mi* 



The advantages of the Charlier scheme for the compuation 
of the semi-invariants lies in the fact that it furnishes an auto- 
matic check of the final results. If we expand the expression 
{x+lYF{x) we have: 

x^F{x) +Ax^F{x) +&x''F{x) +4:xF{x) +F{x) or 
2](x-|-l)'F(x) =S4-|-4s3-t-6s2+4si+so, 

which serves as an independent control check of the computa- 
tions. Moreover, another check is furnished by the relation 

9714 = /I -l-4mi/l3 + 6mr/'L2 -|- 3X2^ + mi^. 

In order to illustrate the scheme we chose the following age 
distribution of 1130 pensioned functionaries in a large American 
Public Utility corporation. 



Ages 


No. of Pensioners 


Ages 


No. of Pensioners 


35-39 


1 


65-69 


283 


40-44 


6 


70-74 


248 


45-49 


17 


75-79 


128 


50-54 


48 


80-84 


38 


55-59 


118 


85-89 


13 


60-64 


224 


over 90 


3 



Choosing the age of 67 as a provisional origin the Charlier 
scheme is shown in detail on next page. 

The computation below gives the numerical values of the 
frequency function which now may may be written as follows: 

F{x) = 1130[<Po(x) +.0258^3(x) +.0158^4(a;)] 
where -^i^^^T 

<poix) = = — ^e 

1.624 V 27r 
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Ages 


X 


F(x) 


xF{x) 


xWM 


x3F(i) 


x4.F(^x) 


{x+l)iF(,x) 


35-39 


-6 


1 


6 


36 


216 


1,296 


625 


40-44 


-5 


6 


30 


150 


750 


3,750 


1,536 


45-49 


-4 


17 


68 


272 


1,088 


4,352 


1,377 


50-54 


-3 


48 


144 


432 


1,296 


3,888 


768 


55-59 


-2 


118 


236 


472 


944 


1,888 


118 


60-66 


-1 


224 


224 


224 


224 


244 





65-69 





286 














286 



700 



708 1,586 4,518 15,418 



4,710 



70-74 


+1 


248 


248 


248 


248 


248 


3,968 


75-79 


+2 


128 


256 


512 


1,024 


2,048 


10,368 


80-84 


+3 


38 


114 


342 


1,026 


3,078 


9,728 


85-89 


+4 


13 


52 


208 


832 


3,328 


8,125 


90-94 


+5 


2 


10 


50 


250 


1,250 


2,592 


over 95 


+6 


1 


6 


36 


216 


1,296 


2,401 



430 



686 



1,396 3,596 11,248 37,182 



1,130 



-22 2,982 -922 26,666 41,892 



1.0000 -.0195 2.6378 



.8156 23.5699 



Xi =mi =-.0195 


m2= 2.6378 


S4= 26,646 


Xi2=TOj2= .0004 


-mi2= -.0004 


4s3 = -3,688 


Xi3=TOl3= .0000 


X2= 2.6374 = 0-2 


6s2= 17,892 


Xi4=mi4= .0000 


VX2"= 1.6240=0- 


4si = - 88 




4.2831 =o3 


so= 1,130 




6.9558=0-4 


41,892 


mami = -.0513, m3mi=-0159 TO22 =6.9580, m2mi2=.0010 


mi= -.8156 


mi = 23.5699 


X4= 2.6450 


-3OT2OT1 = .1539 


— 4m3mi = — .0636 


4miX3 = .0516 


2mi3 = .0000 


-Sms^ = -20.8740 


6mi2X2 = 0060 


X3 = —.UOi.1 


12to2OTi2 = .0127 


3X22=20.8677 




-6mi4 = .0000 
X4 = 2.6450 


mi'i = .0000 




23.5703 =m4 


C3=X3:cr3 = -.1545 


C4=X4:o-4=.3803 




-C3:3!=.0258 


04:4! = +.0158 





118. Comparison Between Observed Data and Theo- 
retical Values. — The next step is now to work out the numeri- 
cal values of F{x) for various values of x and compare such 
values with the ones originally observed. This process is shown 
in detail in the following scheme: 



119 


1 


OBSE 


IRVED 


AND THEORETICAL VALUES 


221 


(1) 


(2) 


(3) 


(4) 


(5) (6) 


(7) (8) 


(9) (10) Obs. 


X 


^-Xi 


(I— Xi):" 


^0(3) 


<P3(z) ip^U) 






-7 


-6.98 


-4.300 


.0001 


+ .0058 +.0170 +.0001 +.0003 


.0005 


-6 


e.98 


3.682 


.0005 


.0176 .0479 


.0015 .0008 


.0028 2 1 


-5 


4 98 


3.067 


.0036 


.0710 .1267 


.0018 .0020 


.0074 5 6 


-4 


3 98 


2.451 


.0198 


.1458 +.0602 


.0038 +.0009 


.0245 17 17 


-3 


2.98 


1.835 


.0741 


+ .0500 -.4345 


+ .0013 -.0068 


0686 48 48 


-2 


1.98 


1.219 


.1897 


-.3502 -.7036 


-:0090 .0111 


.1696 118 118 


-1 


-0.98 


-0.603 


.3326 


-.5287 +.3160 


-.0136 -.0050 


.3140 219 224 





+0.02 


+0.012 


.3989 


+ .0143 1.1963 


+ .0004 +.0189 


.4182 291 286 


+ 1 


1.02 


0.628 


.3273 


.5359 +.2584 


.0138 +.0041 


.3452 241 248 


+2 


2.02 


1.244 


.1835 


+ .3325 -.7157 


+ .0086 -.0113 


.1808 126 128 


+3 


3.02 


1.860 


.0707 


-.0605 -.4094 


-.0015 -.0065 


.0627 44 38 


+4 


4.02 


2.475 


.0186 


.1443 +.0703 


.0037 +.0011 


.0212 15 13 


+5 


5.02 


3.091 


.0034 


.0680 .1241 


.0018 .0020 


.0036 3 2 


+6 


6.02 


3.707 


.0004 


.0165 .0456 


.0004 .0007 


.0007 1 1 


+7 


+7.02 


+4.322 


.0001 


-.0050 .0162 


-.0001 +.0003 


.0003 



Column (1) gives the values of the variate x reckoned from 
the provisional origin, or the centre of the age interval 65-69. 
(2) is a; less the first semi-invariant, whereby the origin is shifted 
to the mean or X. Column (3) represents the final linear trans- 
formation : z = (x — Xi) :a- . 

Columns (4), (5) and (6) are copied directly from the stand- 
ard tables of J0rgensen or Charlier . Column (7) is (5) multiplied 
by 0.0258 or the product - [cs^^sCz) ] :3 !, while (8) is \c-,<pIz) ] :4 !. 

Column (9) is the sum of (4), (7) and (8) If we now distri- 
bute the area N = so or 1130 pro rata according to (9), we 
finally reach the theoretical frequency distribution expressed 
in 5-year age intervals and shown in column (10) alongside 
which we have inserted the originally observed values. Evi- 
dently the fit is satisfactory. It will be noted that the final 
frequency series is expressed in units of 5-year age intervals. 
This, however, is only a formal representation. By subdividing 
the unit inter\^als of column (1) in 5 equal parts, and by com- 
puting all the other columns accordingly, we get the theoretical 
frequency series expressed in single year age intervals. 

119. The Principle of Method of Least Squares. — The 

following paragraph purports to give a brief exposition of the 
determination of the coefficients in the Gram or Laplacean — 
Charlier series in the sense of the method of least squares as a 
strict problem of maxima and minima, wholly independent of 
the connection between the method of least squares and the 
error laws of precision measurements.* 

*In the following demonstration I am adhering to the brief and lucid exposition 
of the Argentinean actuary, U. Broggi, in his exellent Traite d' Assurances sur la Vie. 



222 DETERMINATION OF PARAMETERS [119 

The simple problem in maxima and minima which forms the 
fundamental basis of the method of least squares is the follow- 
ing: Let m unknown quantities be determined by observations 
in such a manner that they are not observed directly but enter 
intocertainA;woM)»i functional relations, /,(a;i,a;2, xs, .... x^), 
containing the unknown independent variables, xi, xi, xs, . . . x^. 
Let furthermore the number of observations on such functional 
relations be n (where n is greater than m). The problem is 
then to determine the most plausible system of the values of 
the unknown x's from the observed system. 

/l (Xl, Xi, X-i, . . . Xr^ = Ol 
/2 (Xl, X2, Xs, . . . Xm) =02 



when /i, f«, . . . fn are the known functional relations and 
Ol, 02, . . . On their observed values. Such equations are known 
as observation equations. 

In order to further simplify our problem we shall also assume 
that 

1 All the equations of the system have the same weight, and 

2 All the equations are reduced to linear form. 

By these assumptions the problem is reduced to find m in- 
knowns from n linear equations. 



ai Xl -\-biX2 + 


. =0i 


a2 Xl + hi X2 + . 


= 02 


aiXi +biX2 + . . 


= 03 


Ur^Xl + bnXz + . . 


■ • =o„ 



Since n is greater than m we find the problem over-deter- 
mined, and we therefore seek to determine the unknown quan- 
tites, Xl, X2, . . x„, in such a way that the sum of the squares 
of the differences between the functional relations and the ob- 
served values, becomes a minimum. This implies that the 
expression 

i =n 

J2iO'i^^+bi^i+ • • • -o,y = \p{xi,X2, ... a; J 
i = l 
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must be a minimum or the simultaneous existence of the equa 
tions. 

^=0 ^ = — = ^^^ 

dxi ' dxi ' ' ' ' dxm 

If we now introduce the following notation 

aiX\+'biX2+ — o, = Xi for i = li2i 3, n. 

The m equations in the above system (7) evidently take on 
the following form 

X161+X262+ . . . +X„6„ = 



If we now again re-substitute the expressions for \ in terms of 
the linear relations 

aiXi+hiX2+ Oi = 'Xi, for i = l, 2, 3, . . .n, 

and collect the coefficients of Xi, X2, . x,„ these equations 

may be expressed in the following symbolical form: 

[aa]xi + [ab]x2 + . — [ao] = 

[ab]xi + [bb]x2 + - [bo] = 



[ak]xi + [bk]x2 +....+ lkk]x^ - [ko] = 

where [aa] = ai^ + 02" + . 

[ab] = tti 61 + (12 62 + . . . 

is the Gaussian notation for the homogenous sum products. 

The above equations are known as normal equations, and it is 
readily seen that there is one normal equation corresponding 
to each unknown. Our problem is therefore reduced to the 
solution of a system of simultaneous linear equations of m un- 
knowns. If m is a small number, or, what amounts to the same 
thing, there are only two or three unknowns the solution can be 
carried on by simple algebraic methods or determinants. If 
the number of unknowns is large these methods become very 
laborious and impractical. It is one of the achievements of the 
great German mathematician, Gauss, to have given us a 
method of solution which reduces this labor to a minimum and 
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which proceeds along well defined systematic and practical 
lines. The method is known as the Gaussian algorithmus of 
successive elimination. 



120. Gauss' Solution of Normal Equations. — For the sake 
of simplicity we shall limit ourselves to a system of four normal 
equations of the form 

[aa]xi + [ab]xi + [ac]x$ + [ad]x4 — [ao] = 
[ab]xi + [bb]x2 + [bc]x3 + [bd]xi - [bo] = 
[ac]xi + lbc]x2 + [cc]x3 + [cd]xi — [co] = 
[ad]xi + [bd]x2 + [cdjxs + [dd]xi — [do] = 

The generalization to an arbitrary number of unknowns offers 
no difficulties, however. 

On account of their symmetrical form the above equations 
may also be written in the more convenient form, viz. : 

[aa]xi + [ab]xi + [acxa + [ad]xi — [ao] = 

[bb]x2 + [bc]x3 + [bd]xi - [bo] = 

[cc]xi + [cd]xi — [co] = 

[dd]Xi - [do] = 

From the first equation we find 

[ao] [ab] [ac] [ad] 

Xi = -— - —-X2 - —-Xi - — -X4 
[aa] [aa] [aa] [aa] 

Substituting this value in the following equations and by the 
introduction of the new symbol 

[ik] — , — { [ak] = [ik.l] 
[aa] 

we now obtain a new system of equations of a lower order and 
of the form 

[bb.l]x2 + [bc.l]xs + [bd.l]Xi - [bo.l] = 
[CC.1]X3 + [crf.l]X4 — [co.l] = 

[dd.l]Xi - [do.l] = 

Solving for Xi we have 

_ [6ol] _ [bc.l] _ [UA] 
""' ~ [66.1] [66.1]''' [66.1]'^' 
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Substituting in the following equations and writing 

[66.1] 

we have 

[cc.2]x3 + [cd.2]Xi = [co.2] 

[dd.2]xi = [do.2], or 

_ [m2] _ [cd.2] 
^' ~ [CC.2] ~ [cc.2f' 

Moreover, by writing 

[ik.2] - [ci.2] ^-^r; = [ik.3]., we have finally 
[cc.2] 

[dd.S]Zi = [rfo.3] 

This gives us the final reduced normal equation of the lowest 
order. By successive substitution we therefore have: 



Xi = 



[do.S] 
[dd.Z] 



_ [co.2] _ [cd.2] 
""' ~ [cc.2] ~ [cc.2]''' 

_ [boA] _ [bc^ _ [bd.l] 
""' " [66.1] [66.1]"^' [66.1]''' 

[ao] [ab] [ac] [ad] 

Xi = -— - j—xi - f — -Xi - - — -xa. 
[aa] [aa] [aa] [eta] 

as the ultimate solution of the unknowns. 



121. Arithmetical Application of Method. — The example 
in paragraph 117 gave an illustration of the application of the 
method of moments. As previously stated this method works 
quite well in cases of moderate skewness, but is less successful 
in extremely skew curves and where the excess is "large. We 
shall now give an illustration of the calculation of the para- 
meters by the method of least squares. The example we choose 
is the well-known statistical series by the distinguished Dutch 
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botanist, deVries, on the number of petal flowers in Ranunculus 
Bulbosus* This is also one of the classical examples of Karl 
Pearson in his celebrated original memoirs on skew variation. 
Although the observations of deVries lend themselves more 
readily to the method of logarithmic transformation, which we 
shall discuss in a following chapter, we have deliberately chosen 
to use it here for two specific reasons Firstly it is a most strik- 
ing illustration in refutation of the incautious criticism of the 
Gram-Charlier series by the aforementioned Mr. Davis. Sec- 
ondly (and this is the more important reason) it offers an ex- 
cellent drill for the student in the practical applications of the 
method of least squares because it gives in a very brief compass 
all the essential arithmetical details. The observations of 
deVries are as follows: 

No. 



of Petals 


X 


Fix) =0. 


5 





133 


6 


1 


55 


7 


2 


23 


8 


3 


7 


9 


4 


2 


10 


5 


2 



where F{x) denotes the absolute frequencies. The observed 
frequency distribution is well nigh as skew as it can be and rep- 
resents in fact a one-sided curve, and should therefore — if the 
statement by Mr. Davis is correct — show an absolute defiance 
to a graduation by the Gram-Charlier series. 

The process we shall use in the attempted mathematical 
representation of the above series is a combination of the method 
of semi-invariants and the method of least squares. Following 
Thiele's advice we determine the first two semi-invariants in 
the generating function directly from the observations while 
the coefficients of this function and its derivatives are deter- 
mined by the least square method. 

Choosing the provisional origin at 5, we obtain the following 
values for the crude moments. 



*Uiidoubtedly many readers will thiak that I have spent an unusual heavy amount 
of arithmetic on such a simple example. This criticism is true, and in actual curve 
fitting practice we would of course resort to a logarithmic transformation. The ex- 
ample is In this particular instance chosen as a drill for the student. I may, however 
remark that if one were to use Personian curves the arithmetical work would be even 
more formidable than through the application of least squares; because we would 
have to resort to mechanical quadrature formulas in order to compute the areas In 
Pearson's curves. 
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So = 222, Si = 140, 82 = 292, S3 = 806, S4 =2,752, S5 = 10,790, S6 = 
46,072, S7 = 207,226, from which we find that 

Xo = l, Xi =0.631, X2 = 0.917, X3 =1.644, >.4 =3.377, X6 = 5.972, 
l6= -2.911, >,7=- 122.638. 

All these semi-invariants with the exception of the two 
first are however, so greatly influenced by random sampling 
in the small observation series that it is hopeless to use them in 
the determination of the constants in the Gram-Charlier series. 
In fact an actual calculation does not give a very good result 
beyond that of a first rough approximation. The generating 
function, on the other hand, may be expressed by the aid of 
the two first semi-invariants as follows: 

I -22:2 

where z is given by the linear transformation: 

2; = (x-0.631):0.9576. (VT2 = 0.9576). 

We now propose to express the observed function F(x) or 
<p{z) by a Gram-Charlier series of the form: 

F{x) = (p{z)=ko<poiz)+k3<P3{z)+h(Pi{z)+ hipi{z). 

In this equation we know the values of the generating func- 
tion and its derivatives for various values of the variate z as 
found in the tables of J0rgensen and Charlier, while the quanti- 
ties k are unknowns. On the other hand we know 6 specific 
values of F{x) as directly observed in deVries's observation 
series. We are thus dealing with a system of typical linear 
observation equations of the forms described in paragraphs 
119 and 120 and which lend themselves so admirably to the 
treatment by the method of least squares. 

From the above linear relation between x and z we can directly 
compute the following table for the transformed variate z. 






-0.688 


1 


+0.402 


2 


+1.493 


3 


+2.583 


4 


+3.674 


5 


+ 4:.764: 
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The numerical values of ^o(z) and its derivatives as corres- 
ponding to the above values of z can be taken directly from the 
standard tables of J0rgensen and Charlier. We may there- 
fore write down the following observation equations (a) 



fo 



^. 



f. 



fe 



.3148^0 


-.5472*3 


+.1207*1 


+2.2728*5 


+ .9591*6 


-133 =0 


,3679*0 


+.4198*3 


+.7566*4 


-1.9836*5 


-2.9860*6 


- 55=0 


.1308*0 


+.1506*3 


-.7073*4 


+ .4540*5 


+2.8600*6 


- 23=0 


.0145*0 


-.1346*3 


+.1062*4 


+ .2642*5 


-1.2130*6 


- 7=0 


.0005*0 


-.0180*3 


+.0486*4 


- .1070*5 


+ .1482*6 


- 2=0 


.0001*0 


-.0005*3 


+ .0020*4 


- .0043*5 


+ .0540*6 


- 2 =0 



for which we now propose to determine the unknown values of 
k by the least square method. 

While this method may of course be applied directly to the 
above data, it will generally be found of advantage to start 
with some approximate values of the /c's. It is found in prac- 
tice that this approximate step saves considerable labour in 
the formation and ultimate solution of the normal equations. 

Although the first approximation in the case of numerous un- 
knowns must be in the nature of a more or less shrewd guess, 
which facility can only be attained by constant practice in rou- 
tine mathematical computing, we are, however, in this specific 
instance able to tell something about the nature of the coeffi- 
cients from purely a priori considerations. W-e know for in- 
stance from the form of the Gram-Charlier series that the 
coefficient ko of the generating function must be nearly equal 
to the area of the curve, which in this particular instance is 222. 
Moreover, a mere glance at the observed series tells us that it 
has a decidedly large skewness in negative direction from the 
mean coupled with a tendency of being "top heavy," indicating 
positive excess. We can therefore assume as a first approxima- 
tion that the coefficients of the derivatives of uneven order are 
negative and the coefficients of derivatives of even order are 
positive. Again, it is also seen that the coefficients of the de- 
rivatives of higher order than the fourth must be relatively 
small in comparison with the coefficients of the derivatives of 
lower order, otherwise the series would not be rapidly con- 
vergent. 

From such purely common sense a priori considerations we 
therefore guess the following first approximations, viz. : 

k\ = 222, k'3 = -10Q,k\= 20, k\ =-5,k\ = 5. 
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The probable values of the various k's may be written as 

ki = ri¥i for i = 0, 3, 4, 5 and 6, 

and our problem is therefore to find the correction factor r,- with 
which the approximate value A;\ must be multiplied so as to 
give ki. 

Applying the various values of ¥{ to the original observation 
equations in (a) we obtain the following schedule for the numeri- 
cal factors of r,. 



a 


b 


c 


d 


e 





5 


699 


+547 


+ 24 


-144 


+ 48 


-1,330 


-126 


817 


-420 


+151 


+ 99 


-149 


- 550 


- 52 


290 


-151 


-141 


- 23 


+143 


- 230 


-112 


32 


+135 


+ 21 


- 13 


- 61 


- 70 


+ 44 


1 


+ 18 


+ 10 


+ 5 


+ 7 


- 30 


+ 11 





+ 


+ 


+ 


+ 3 


- 10 


- 7 



1839 +129 +65 - 46 - 9 -2,220 —242 

where the additional control column s serves as a check. 

The subsequent formation of the various sum-products and 
normal equations is shown in the following schedules together 
with the s columns as a check. 



aa 


ab 




ac 




ad 


ae 




ao 


as 


488,601 


+382,353 


+ 


16,776 


- 


79,686 


+ 33,552 


- 


929,670 


- 88,074 


667,489 


-343,140 


+123,367 


+ 


80,883 


-121,733 


— 


449,350 


- 42,484 


84,100 


- 43,790 


— 


40,890 


- 


6,670 


+ 41,470 


- 


66,700 


- 32,480 


1,024 


+ 4,320 


+ 


672 


- 


416 


- 1,952 


- 


2,240 


+ 1,408 


1 


+ 18 


+ 


10 


+ 


5 


+ 7 


- 


30 


+ 11 





+ 


+ 





+ 





+ 


- 








1,241,215 


239 


+ 


99,935 


- 


5,884 


- 48,656 


-1,447,990 


-161,619 




bb 




6c 




bd 


be 




bo 


6s 




299,209 


+ 


13,128 


- 


63,258 


+ 26,256 


- 


727,510 


- 68,922 




176,400 


— 


63,420 


- 


41,580 


+ 62,580 


+ 


231,000 


+ 21,840 




22,801 


+ 


21,291 


+ 


3,473 


- 21,593 


+ 


34,730 


+ 16,912 




18,225 


+ 


2,835 


- 


1,7.55 


- 8,235 


- 


9,450 


+ 5,940 




324 


+ 


180 


+ 


90 


+ 126 


— 


540 


+ 198 







+ 





+ 





+ 


- 





+ 




516,959 


- 


25,986 


-102,130 


+ 59,134 


- 


471,770 


- 24,032 








cc 




cd 


ce 




CO 


cs 








576 


— 


2,736 


+ 1,152 


- 


31,920 


- 3,024 








22,801 


+ 


14,949 


- 22,499 


- 


83,050 


- 7,852 








19,881 


+ 


3,243 


- 20,163 


+ 


32,430 


+ 15,792 








441 


- 


273 


- 1,281 


— 


1,470 


+ 924 








100 


+ 


50 


+ 70 


— 


300 


+ 110 











+ 





+ 


- 









43,799 + 15,233 - 42,721 - 84,310 + 5,950 
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dd 

12,996 

9,801 

S29 

169 

25 





de 

- 5,472 

- 14,751 

- 3,289 
+ 793 
+ 35 
+ 



do 

151,620 

54,450 

5,290 

910 

150 





+ 14,364 

- 5,148 
+ 2,576 

- 572 
+ 55 
+ 



23,520 



22,684 + 103,220 + 11,275 



2,304 


- 


63,840 


- 


6,048 


22,201 


+ 


81,950 


+ 


7,748 


20,449 


- 


32,890 


— 


16,016 


3,721 


+ 


4,270 


- 


2,684 


49 


- 


210 


+ 


77 


9 


- 


30 


- 


21 



48,733 



10,750 



16,944 



We may now write the normal equations in schedule form as 
follows: 

Original Normal Equations 



(a) 


1,241,215 - 239 


+ 99935 


- 5884 


- 48656 


-1447990 


(1) 


+ 


- 19 


+ 1 


+ 9 


+ 278 


[h) 


+516959 


- 25986 


-102130 


+ 59134 


- 471770 


(2) 




+ 80^6 


- 474 


- 3917 


- 116582 


(c) 




+ 43799 


+ 15233 


- 42721 


- 84310 


(3) 






+ 28 


+ 231 


+ 6865 


id) 






+ 23520 


- 22684 


+ 103220 


(4) 








+ 1907 


+ 66761 


(e) 








+ 48733 


- 10750 



(5) 



.00019 



+.08051 



.00474 - 



-1.16659 



The sum-products from the observation equations are shown 
in the rows marked (a), (6), (c), (d) and (e). The row marked 
(5) and printed in italics is formed by dividing each of th 
figures in row (a) with 1,241,215. The row marked (1) con- 
tains the products of the figures in row (a) multiplied with the 
factor .00019. Row (2) is the products of the factor 0.08051 
and the figures in row (a), while row (3) is the product of the 
factor — 0.00474 and the figures in row (a). The products in 
row (4) are formed in the same manner by means of the factor 
-0.03920. 

We next subtract row (1) from row (6), row (2) from row (c), 
row (3) from row (d), and so forth, which results in the follow- 
ing schedule, which is known as the first reduction equation. 
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First Redtjction Equation 








(a) 


+516959 - 25967 


-102131 


+ 59125 


— 


472048 


(1) 


+ 1304- 


+ 5130 


- 2970 


+ 


23711 


(b) 


+ 35753 


+ 15707 


- 38804 


+ 


32272 


(2) 


. 


+ SO 177 


- 11681 


+ 


93268 


(c) 




+ 23492 


- 22915 


+ 


96355 


(3) 






+ 6762 


- 


63988 


id) 






+ 46826 


- 


67511 


(4) 


-.05023 


-.19766 


+ .1H37 


— 


.91313 



The above equations are treated in a similar manner as the 
original normal equations, and we have therefore the 2d reduc- 
tion equation of the form: 



(a) 
(1) 
(b) 
(2) 
(c) 



Second Reduction Equation 





+ 34451 


+ 


10577 


- 35834 


+ 


8561 






+ 


3247 


- 11002 


+ 


2628 






+ 


3315 


- 11234 

+ 37273 
+ 40064 


+ 


3097 

8905 

13523 






+ . 


30702 


-1.04014 


+ 


.24850 


Third 


' Reduction Equation 












+ 


68 


232 


+ 


469 



(3) 



(a) 
(1) 

(6) 



+ 791 - 1600 
+ 2791 - 4618 



(2) 



-3.41170 +6.89706 



Fourth Reduction Equation 

+ 



2000 - 3018 



The solution for the unknown r's may now be shown as fol- 
lows: 

rs =3018:2000 = 1.5090 

^3= _6.8971-(1.5090)(-3.4117) = -1.7488 

u = -0.2485- (-1.7488) (0.3070) - (1.5090) (-1.0401) =1.8580 

ra = 0.9131 - (1.8580) (-0.0502) - (- 1.7488) (- 0.1976) - (1.5090) 
(0.1144) =0.4884 

ro - 1.1666 - (0.4884) ( - 0.0002) - (1.8580) (0.0805) - ( -1.7488) 
(-0.0047) -(1.5090) (-0.0394) =1.0679 
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From the above values of r and by means of the relation 

fc, = r; fc,2 for i= 0, 8, 4, 5 and 6 

we can easily determine' the most probable values of A;,- with 
which the original observation equations as shown on page (228) 
must be multiplied so as to satisfy the observed values of F{x) 
in the sense as implied in the method of least squares. 
This results in the following arrangement: 



z 


*o'Po(2) 


iaV'sCz) 


ki'PiU) 


ks'PsU) 


ksV>&{z) 


2fe^i 


Obs. 


-0.688 


74.6 


+26.7 


+ 4.4 


+ 19.7 


+ 7.2 


132.6 


133 


+0.402 


87.2 


-20.5 


+28.1 


-17.2 


-22.4 


55.2 


55 


1.493 


31.0 


- 7.3 


-26.2 


+ 3.9 


+21.5 


22.9 


23 


2.583 


3.4 


+ 6.6 


+ 3.9 


+ 2.3 


- 9.1 


7.1 


7 


3.674 


0.2 


+ 0.9 


+ 1.8 


- 0.9 


+ 1.0 


3.0 


2 



+4.764 0.0 +0.0 +0.1 + 0.0 + 0.0 0.1 2 

The agreement between the calculated values and the origi- 
nally observed series leaves evidently little to be desired in the 
way of a satisfactory "fit." 

If we limited ourselves to three terms of the series and put 

F{x) = (p{z) = ^ki(pi{z} for z = 0, 3 and 4 

and then determined ka, ks and ^4 b y the method of least squares 
the final result would be of the form : 

^(z) =264.2^o(z) -89.9953(2) -5.2^4(2), 

for which the calculated values of the frequency function would 
be as follows: 



X 


F(x) 


Obs. 


Pearson 


5 


131.6 


133 


136.9 


6 


55.2 


55 


48.5 


7 


24.5 


23 


22.6 


8 


15.5 


7 


9.6 


9 


1.6 


'> 


3.4 


10 


0.2 





0.8 



The fit is evidently not so close as when we use 6 terms, but 
it is by no means a poor fit and does not require nearly so much 
arithmetical work as the larger number of terms in the fre- 
quency series. In this connection it is of interest to compare 
the present graduation with the result reached by Pearson, 
which is also shown in the above table. Taken all in all there 
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is no doubt in my mind that the serial expension gives far better 
results than the Pearsonian methods and does not entail nearly 
so much labour as these. 

Note on Adjusted Moments. — The theoretical moments are given 
in the form of definite integrals -nhile the observations always give us the 
moments on the form 

n= M n — }^a 

where a is the class interval of the observations. In order to determine 
the semi-invariants it is, however, required to know the continuous 
moments 

Mr=J'x'<p{x)dx 

00 

The values of s being given in the form of finite sums and not as defi- 
nite integrals are therefore subject to certain adjustments if we wish to 
express them as continuous moments. The necessary adjustments can, 
however, easily be performed by well-known formulas from the theory 
of mechanical quadrature if the frequency function and its derivatives 
vanish for x= — <^ and x = +x . The English mathem.atician, Sheppard, 
has among others developed the following simple formulas for the transi- 
tion from s to M : 

Mo=so, Ml =si, Mi =S2 — ]o«2, Mz =$3 — t si 
a2 , 7a* ,, 5a2 , 7a2 

Mi =S4 "2 S2 +240 ^0' -^5 =S5 -- g- S3 +^ Sl 

The Sheppard adjustments again emphasizes the fact that the method 
of moments works with curve areas instead of curve ordinates, which 
necessarily must lead to some sort of mechanical quadrature formula 
unless we are able to evaluate the indefinite integrals of the expressions 
for the frequency functions. If we use curve ordinates to calculate the 
specific numerical values of Pearson's frequency functions we are liable 
to encounter large errors. This fact is among other things pointed out 
by Caradog Jones who in mentioning the use of ordinates points out that 
"it must be remembered the resulting values are only a first approxima- 
tion to the observed frequencies and a better series is obtained if, by 
using some good quadrature formula, we calculate the AREAS for the 
successive groups between the curve, the bounding ordinates, and the 
axis of X." This is one of the great drawbacks to the otherwise elegant 
Pearsonian types of frequency curves because it entails a large amount 
of arithmetical work to compute specific numerical values from the final 
formulas as determined by Pearson's ciu-ve types. 

Any reader who will take the trouble to consult the original memoirs 
by Pearson and Elderton and the recently published treatise by Mr. 
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Caradog Jones will there find ample evidence of the large amount of 
tedious arithmetical work involved in the application of mechanical 
quadrature formulas. The recently suggested finite difference equation 
formulas by the American mathematican, Carver, while emphasizing the 
difficulty of applying the Pearson system, do not tend materially to shorten 
the arithmetical work very much, and Mr. Carver must in the final in- 
stance resort to mechanical quadrature. ' 

All these difficulties are, however, eUminated in the case of the deter- 
mination of the frequency function in serial form by means of the method 
of least squares where we work equally well with ordinates as with areas. 
A further advantage of the Gram — Charlier expansion in series is found 
in the fact that standard tables of the generating function and its deriva- 
tives as well as the definite integrals of these functions have been pubUshed 
both by Bruhns, Charlier and Jorgensen. Speaking from a purely per- 
sonal point of view I wish to state that through a long and varied experi-- 
ence in practical curve fitting to the most diverse kinds of statistical data 
I have had occasion to use both the Pearsonian and the Gram — Charlier 
type of curves, and while I fully recognize the theoretical elegance and 
apparent simplicity of the Pearson system, I feel nevertheless that from 
the point of view of the practical computer the older system as devised 
by the Scandinavian investigators is to be preferred in comparison with 
the methods advocated by the followers of the distinguished founder of 
the Enghsh Biometric school. 



*Mr. Carver's able and interesting analysis by means of finite difference equations 
is, however, to a great extent anteceded by the much earlier Danish memoirs of Opper- 
man and Gram where the finite difference equation methods are discussed. 



CHAPTER XVI 

LOGARITHMICALLY TRANSFORMED FREQUENCY 
FUNCTIONS 

122. Transformation of the Variate. — While it is alvrays 
possible to express all frequency curves by an expansion in 
Hermite polynomials, the numerical labor when carried on by 
the method of least squares often involves a large amount of 
arithmetical work if we wish to retain more than four or five 
terms of the series. Other methods lessening the arithmetical 
work and making the actual calculations comparatively simple 
have been offered by several authors and notably by Thiele, 
who in his works discusses several such methods. Among 
those we may mention the method of the so-called free func- 
tions and orthogonal substitution, the methods of correlates 
and the adjustment by elements. The chapters on these 
methods in Thiele's work are among some of the most import- 
ant, but also some of the most difficult in the whole theory of 
observations and have not always been understood and appre- 
ciated by the mathematicians, chiefly on account of Thiele's 
peculiar style of writing. A close study of the Danish scholar's 
investigations is, however, well worth while, and Thiele's work 
along these lines may still in the future become as epochmaking 
in the theory of probability as some of the researches of the 
great Laplace. The theory of infinite determinants as used by 
M. Fredholm in the solution of integral equations is another 
powerful tool which offers great advantages in the way of 
rapid calculation. All these methods require, however, that 
the student must be thoroughly familiar with the difficult 
theory upon which such methods rest, and they have for this 
reason been omitted in an elementary work such as the present 
treatise. 

We wish, however, to mention another method which in the 
majority of cases will make it possible to employ the Gram or 
Laplacean — Charlier curves in cases with extreme skewness 
or excess. We have here reference to the method of logarithmic 
transformation of the variate, x. 

235 
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123. The General Theory of Transformation. — One of 

the simplest transformations is the previously mentioned linear 
transformation of the form z=f{x) =ax+b, by which we can make 
two constants, Ci and d vanish. Other transformations sug- 
gest themselves, however, such as f(x} =ax'+bx+c, f{x) = V a;, 
fix) =logx and so forth. For this reason I propose to give a 
brief development of the general method of transformations of 
the statistical variates, mainly following the methods of Char- 
lier and J0rgensen. 

Stated in its most general form our problem is: If a fre- 
quency curve of a certain variate is given by F{x) what will be 
the frequency curve of a certain function of x, say/(x)? 

The equation of the frequency curve is y = F{x), which 
means that F{x)dx is the probability that x falls in the interval 
between a; — 3/^da; and x-\-}/2dx. The probability that a new 
variate z after the transformation z =f{x), or x (z) = x, falls in the 
interval z — }/2dz and z+}4,dz is therefore simply 

F[xiz)]x'{z)dz=Fix)dx, 

which gives in symbolic form the equation of the transformed 
frequency curve. 

The frequency for z =}{x) is of course the same as for x. The 
ordinates of the frequency curve, or rather the areas between 
corresponding ordinates, are therefore not changed, but the 
abcissa axis is replaced by /(x). Equidistant intervals of x 
will therefore not as a rule — except in the linear transformation 
■ — correspond to equidistant intervals of /(x). 

If, for instance, the frequency curve F{x) is the Laplacean 
normal curve 

1 -x2:2o-2 

F{x)= — ^^e 
o-V27r 

and if we let z =/(x) = x^ or x = V 2, we have evidently 

Fiz) =— — - =- 

<r^J2^^ 2Vz 

124. Logarithmic Transformation. — Of the various trans- 
formations the logarithmic is of special importance. 1 1 happens 
that even if the variate x forms an extremely skew frequency 
distribution its logarithms will be nearly normally distributed. 
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This fact was already noted by the eminent German psycholo- 
gist, Fechner, and also mentioned by Bruhns in his Kollek- 
tivmasslehre. But neither Fechner nor Bruhns have given a 
satisfactory theoretical explanation of the transformation and 
have limited themselves to use it as a practical rule of thumb. 

Thiele discusses the method under his adjustment by ele- 
ments, but in a rather brief manner. The first satisfactory 
theory of logarithmic transformation seems to have been given 
first by J0rgensen and later on by Wicksell.* J0rgensen first 
begins with the transformation of the normal Laplacean fre- 
quency curve. Letting z = logx and bearing in mind that the 
frequency of x equals that of logx we have 

z=f{x)=logx, or a;= x(,z) =e' and dx = e^dz 

The continuous power sums or moments of the rth order 
around an arbitrary origin take on the form 

/-j- oo 1/ x — m \2 /^oo l/ loQX — m \2 

x'e '^^ " ' dx={n^2T)-^NJx'e^^ " ' dx 



o 



= (wV 27r)-Wj e^'e ^^ « ^e'dz 

00 

The change in the lower limit in the second integral from 
— 00 to zero arises simply from the fact that the logarithm of 
zero equals minus infinity and the point — oo is thus by the 
transformation moved up to zero. 

By a straightforward transformation (see appendix) we may 
write the above integral as 



jy m(r+l)+i.i7i = (r+l)2 ^ + =0 m(T+l) + ;4n2(,r+l)l 






Changing from moments to semi-variants by means of the 
well-known relations 

lo=Mo 

\y=M^:Ma 

\2 = {MMo-M^^):M<? 



*Thelaw of errors, leading to the geometric mean as the most probable value of the 
variate as discovered by Sir Donald McAlister in 1879 may, however, be considered 
as a foreruni.er 01 Jtirgonsen's ■v\ork. 
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li = iM,Mo'—4:M3M,Mo'-SM2'Mo^+12M2Mmo-6M,') -.Mo' 



we have 



5,2 = e2-+3.»(gn^_l) 

1, = e4m+6.2(g6.^ _ 4g3n2 _ g^Sn^ + 126"' - 6) 



These equations give the semi-invariants expressed in terms 
of m and n. On the other hand if we know the semi-invariants 
from statistical data or are able to determine these semi-in- 
variants by a priori reasoning we may find the parameters m 
and n. 

125. The Mathematical Zero. — A point which we must 
bear in mind is that the above semi-invariants on account of 
of the transformation are calculated around a zero point which 
corresponds to a fixed lower limit of the observations. 

Very often the observations themselves indicate such a lower 
limit beyond which the frequencies of the variate vanish 
In the case of persons engaged in factory work t'li.-a is in most 
countries a well-defined legal age limit below which it is illegal 
to employ persons for work. Another example is o^fs'sd in 
the number of alpha particles radiated from certain radioactive 
metals. Since the number of particles radiated in a certain 
interval of time must either be zero or a whole positive number 
it is evident that — 1 must be the lower limit because we can have 
no negative radiations. Analogous limits exist in the age 
limit for divorces and in the amount of moneys assessed in the 
way of income tax. 

The lower limit allows, however, of a more exact mathe- 
matical determination by means of the following simple con- 
siderations. It is evident that this lower limit must fall below 
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the mean value of the frequency curve. Let us suppose that 
it is located at a certain point, a, at a distance of r; units from 
the mean M = Xi(x) — >7 = a; and let us furthermore as a be- 
ginning place the origin at ^i(x), in which case /li of course 
equals zero. By shifting the origin to a, which implies a 
translation of vj units in negative direction, the original variate 
(x) is transformed into x+yj, and ^i will now equal 57 while 
the semi-invariants of higher order remain the same as before, 
the transformation because of the well known relation 

\{x—i^) = \(x) for r >1 

We may therefore write the previously given relations 
between the /I's and m and n as follows: 

X2 = >?Ke"'-l) or e'''=l+;i2:>7' 

rv 

which reduces to /l3>7^ — 3^2^>7^ — ^2^ =0. 



^-■[(H'-<HH-'0 



The solution of this cubic equation which has one real and 
two imaginary roots gives us the valiie of >? or ^1 — a and thus 
determines the mathematical zero or lower limit. We have in 
fact: 

n^ = log(l+^2:>7^) and 

TO = log >7 — 1.5m^ while 

A/' = X.o:e'"+^''' 

126. Logarithmically Transformed Frequency Series. — 

We have already shown that the generalized frequency curve 
could be written as 

„, , , s Cliflix) , C2^2(X) C3<p3ix) 

Fix) = co(po{x) - — -, h — ^^ ^^ \- 

where the Laplacean probabihty function 






o(x)=^=e ''-' 



(TV27r 
is the generating function with M and o- as its parameters. 
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The suggestion now immediately arises to use an analogous 
series in the case of the logarithmic transformation. In this 
case the frequency curve, F{x), with a lower limit would be ex- 
pressed as follows: 

Fix) = koioix) r^ h 



1! 2! 3! ' 

while the generating function now is 

-| 1 flog a:— 77112 

n\ 2ir 

where m and n are the parameters. 

Using the usual definition of semi-invariants we then have 

XiM + Xm 2 +X3& )3+ 

II 21 31 SiCO S203^ Saw' 

r°t.,r; ^ / ^ ki^^{x) ki^iix) kz^zix) ,, 

= J e"^[ka^o{x) ~^— H — Z]~ ' ' ' ' 

o 
The general term on the right hand side integral is of the form 

/CO 
e''"^Xx)dx 
o 

where the integral may be evaluted by partial integration as 
follows: 



/OO 00 /* 



e""<l>,(a;)dx = e^"'I'^i(x)]-w I e^"*,-i(a;) da; 



Since both $o(a;) and all its derivatives are supposed to 
vanish for a; = and x = oo the first term to the right becomes 
zero and 

/OO /»oo 

e'"^^,ix)dx= -01 J e'"^9,-^{x)dx 



By successive integrations we then obtain the following 
recursion formula 
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( -coyje^"^,-,ix)dx = ( -co)^ Je^"$,-2(x)da; 
o o 



o o 



o o 

Or finally 



fx /*00 

e""-I>,(x)rfx = (-co) J e^"*o(x)rfa; 



o 



Expanding e""" in a power series we have 



o o 

The general term in this expansion is of the form 

r />x irlogi — m"|2 



— == -r f X e ax 

ny, 1-K r!J 
o 



which according to the formulas given on page (23 7) reduces to: 

Hence we may write 

/ e'^"$,(x)rfx = (— m) ^jfi co'^:r! 

o r =0 

Consequently the relation between the semi-invariants and 
the frequency function 

F(x)=A;ofo(x)-^$i(x)+^*2(x)-^3(x)-|- 
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can be expressed by the following recursion formula 

^r 21 31 , Sico Siu' S3W' 

t) =0 s =0 r =0 

The constants A; are here expressed in terms of the unadjusted 
moments or power sums, s. It is readily seen that the Sheppard 
corrections for adjusted moments, M, also apply in this case. 
We are, therefore, able to write down the values of the k's from 
the above recursion formula in the following manner 

ilf 2 = A;2e"+^"' +2A;ie-'"+2"' + /fcoe^'"-^ *'^"' 

Mi = ^46"'+^"' +4^362"'+^'" +6/(;2e3'"+*-^'" +4A;ie*"+^"' 

It is easy to see that it is not possible to determine the gener- 
ating function's parameters m and n from the observations. 
These parameters like M and o in the case of ,the Laplacean 
normal probability curve must be chosen arbitrarily. If 
m and n are selected so as to make ki and k^ vanish we have 

M, = koe^^+*-^''' 
the solution of which gives 

„.^MoM2 3„ Mi^ ikfo»M2 

while 

fcie" ' ' ^"^ = M4 - 4M3e" ^ 1 •■^"' - Moe*"" -^ ""'(e-^"' - 4) 
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This theory requires the computation of a set of tables of the 
generating function 

1 ir log x—m l^ 

$o(x)=^=e~^L— ;^J 
■rev 2ir 

and its derivatives. For i(,{x) itself we may of course use the 
ordinary tables for the normal curve <po(z) when we consider 

log x—m 

z = 

n 

I have calculated a set of tables of the derivatives of fo(,x) 
and hope to be able to publish the manuscript thereof in the 
second volume of this treatise. 

127. Parameters Determined by Least Squares. — The 

above development is based upon the theory of functions and 
the theory of definite integrals. We shall now see how the 
same problem may be attacked by the method of least squares 
after we have determined by the usual method of moments the 
values of m and n in the generating function ^o(z). 

Viewed from this point of vantage our problem may be 
stated as follows: 

Given an arbitrary frequency distribution, of the variate z 
with z = (log x—in):n and where x is reckoned from a zero 
point or origin a, which is situated v units below the mean and 
defined by the relation 

7/^X3— 317^X2^ = >.2', where v = 'ki-a; 

to develop F(z) into a frequency series of the form 

F{z)=k}<Po{z)+k3<Piiz)+h<Piiz)+ . . , .+kn<Pn(z), 

where the k's must be determined in such a way that the ex- 
pression 

1=71 

JlkiiPiiz) 

gives the best approximation to F{z) in the sense of the method 
of least squares. 

Stated in this form the frequency function is reduced to the 
ordinary series of Gram or the A type of the Charlier series, 
already treated in the earlier chapters. 
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128. Application to Graduation of a Mortality Table. — 

As an illustration of the theory to a practical problem we pre- 
sent the following frequency distribution by 5-year age intervals 
of the number of deaths (or Id^ by quinquennial grouping) 
in the recently published American-Canadian Mortality of 
Healthy Males, based on a radix of 100,000 entrants at age 15. 

Frequency Distribution of Deaths by Attained Ages in 
American-Canadian Mortality Table 



Ages 


Xdx 


1st Component 


2(1 Component 


15- 19 


1,801 


120 


1,681 


20- 24 


1,996 


230 


1,766 


25- 29 


2,089 


440 


1,649 


30- 34 


2,120 


790 


1,330 


35- 39 


2,341 


1,370 


971 


40- 44 


2,911 


2,270 


641 


45- 49 


3,937 


3,570 


367 


50- 54 


5,527 


5,400 


127 


55- 59 


7,723 


7,722 


1 


60- 64 


10,383 


10,383 




65- 69 


12,987 


12,987 




70- 74 


14,535 


14,535 




75- 79 


13,807 


13,807 




80- 84 


10,328 


10,328 




85- 89 


5,464 


5,464 




90- 94 


1,757 


1,757 




95- 99 


278 


278 




100-104 


16 


16 





100,000 91,467 8,533 

The curve represented by the d^^ column is evidently a com- 
posite frequency function compounded of several series. From 
a purely mathematical point of view the compound curve may 
be considered as being generated in an infinite number of ways 
as the summation of separate component frequency curves. 
From the point of view of a practical graduation it is, however, 
easy to break this compound death curve up into two separate 
components. A mere glance at the d^ curve itself suggests a 
major skew frequency curve with a maximum point somewhere 
in the age interval from 70-75 and minor curve (practically 
one-sided) for the younger ages. 

Let us therefore break the -d^ column up into the two so far 
perfectly arbitrary parts as shown in the above table and then 
try to fit those two distributions to logarithmically transformed 
A curves. 
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Starting with the first comixHoent the straightforward com- 
putation of the semi-invaiiante is gi\-eii in the table below with 
the providonal mean chosen at age 67. 

Frkqckxct DisTKLBmox OF Deaths ix Ahebiea^ Mortautt Tabus 

FlKST COMPOXK>>T 



A^^ 


S 


-•".JT- 


jrfjr- 


jiF.I> 


jrSFiJr* 


KM-100 


— 4 


16 


112 


7S4 


5.4SS; 


99- V>5 


- 6 


27S 


I.1V.VS 


lO.OOS 


eaois 


94- 90 


— 5 


1,757 


<.7So 


43,925 


219,625 


Si^- So 


- 4 


5,4«>4 


21.S0O 


S7.424 


349.696 


S4- SO 


- 3 


lO.Sis 


30,9S4 


92,952 


2.NSoti 


7\V To 


_ -■> 


13.N.17 


27.1U4 


55.-22S 


lia4o6 


74- 70 


- 1 


14.535 


14,535 


14.535 


14.535 


69- 65 


- 


li;.;iS7 













■^ 


5v\172 


11X5,554 


304.S56 


1.03S.7LU 


IH- 60 


-r 1 


U\:>S^ 


10.3S3 


10.SS3 


lasss 


oil- 55 


— 2 


7,723 


15.446 


30.S92 


61.7S4 


54- oil 


— o 


5,400 


16.200 


4S.600 


14O.S00 


49- 4o 


+ 4 


3.570 


14.2m1 


57.1-20 


22S.4S0 


44- 40 


— 5 


2.270 


11.350 


50,750 


2S3.750 


39- « 


- 6 


1,370 


S.-22i^ 


49.320 


295.920 


34- 30 


-r ~ 


TW) 


5.530 


3S.710 


27a970 


>^ 35 


— s 


440 


3.520 


■2S.UV 


22O.-2S0 


24- -20 


— 9 


230 


2,070 


IS.ivV 


Io7.o70 


19- 15 


+10 


lAi 


1.200 
SS.199 


12.iXX> 
350.5iV5 


i2o.a» 




■V 


32,2t>(> 


1.S10.037 




^<. 


vn.4tiS 


— 17.355 


IVV..421 


771,333 



Computing the semi-invariante by means of the usual for- 
mulas in paragraph 104, we have: 

/,, = — 173o5:9146S= -0.1S974,ormeanatage67+5(0.19)OTat 

age 67.9-5 

>.i = eo54-21 :9146S - >,- = 7.1296 

/,* = 771333:9146s -3/-,a;-2>-:' = 12.49S1 

In order to determine the mathematical zero or tiie <Higin 
we ha V e to solve the following cubic : 

Ljr — o>'.i"y~ = Ai . or 

12.4SiV- 152.511 r = 362.47 

the positive root of which is equal to 12.39. The zoo point 
is therefore found to be dtuated 12.39 5-year units from the 
mean or at age 67.95— 5^12.39>,, i. e. vary nearly at age 130, 
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which we henceforth shall select as the origin of the co-ordinate 
system of the first component. We have furthermore 

12.39 = 6"+ '•5"', and 7.1296 = e^""''^''\e''' -1) ={12.39yie''" -1), 

the solution of which gives n' = 0.04436, n = 0.2106, m =2.4504, 
all on the basis of a 5-year interval as unit. If we wish to 
change to a single calendar year unit we must add the natural 
logarithm of 5, or 1.6094, to the above value of m, which gives us 
m =4.0598, while n remains the same. The above computa- 
tions furnish us with the necessary material for the logarithmic 
transformation of the variate x which now may be written as 

z = [log (130 -x)- 4.0598] :0.2106, 

where x is the original variate or the age at death. 

Having thus accomplished the logarithmic transformation 
we may henceforth write the generating function as 



1 r log( 130-a:)-4.0598 -l 2 -i 

$„(X) = __^ — '2^ 0.2106 J ^^^(^) ^_^g 



2L 0.2106 J i-„\ ,-^°:3 

.2106V 27r V 27r 



We express now F{x) by the following equation. 

F(x) =kotoix) +h<i?3(x) +ki<^i{x) + .... 

or in terms of the transformed z: 

<p{z) =ko<pi{z) +ki(P3{z) +ki<Pi(z) + . . . . , 

and proceed to determine the numerical values of k by the 
method of least squares. 

129. Formation of Observation Equations. — The values 
of voiz) and its 3rd and 4th derivatives may be written down 
directly from the tables of J0rgensen or Charlier for various 
values of z as shown in detail in the following scheme on 
the following pages.* 



(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


Age 


Z 


lp0(2) 


ifiU) 


^1(2) 


*o(3) 


*:3(4) 


*:4(5) 


J'lC^) 


15 


+3.257 


+.0020 


-.0491 


+.1029 


+ 14 


+ 10 


- 1 


23 


6 


3.213 


.0023 


.0537 


.1084 


17 


10 


1 


26 


7 


3.170 


.0026 


.0.586 


.1146 


19 . 


12 


1 


30 


8 


3.127 


.0030 


.0637 


.1208 


22 


14 


1 


35 



*The values of z, co-ordinate with those oix are computed for all integral values of 
X from 15 to 100 In accordance with the previously established relation viz 2=lloe 
(130-a:)-4. 06981:0. 2106. Alllogarithms on base e. 
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(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


Agei! 


z 


<po(.^) 


<p3(.2) 


1^4(2) 


fco(3) 


^3(4) 


M5) 


Fiiz) 


9 


3.085 


.0034 


.0688 


.1249 


25 


15 




39 


20 


3.041 


.0039 


.0744 


.1290 


29 


16 




44 


1 


2.999 


.0044 


.0803 


.1331 


32 


17 




48 


2 


2.955 


.0051 


.0850 


.1361 


38 


18 




57 


3 


2.911 


.0057 


.0919 


.1382 


42 


19 




60 


4 


2.866 


.0065 


.0981 


.1391 


48 


21 




68 


25 


2.821 


.0074 


.1044 


.1390 


54 


22 




75 


6 


2.776 


.0085 


.1104 


.1367 


63 


24 




86 


7 


2.730 


.0096 


.1168 


.1328 


71 


25 




86 


8 


2.683 


.0110 


.1229 


.1264 


81 


26 




106 


9 


2.637 


.0123 


.1286 


.1159 


91 


27 




117 


30 


2.587 


.0140 


.1340 


.1072 


103 


28 




130 


1 


2.542 


.0150 


.1387 


.0943 


116 


29 




144 


2 


2.494 


.0178 


.1420 


.0763 


131 


30 




160 


3 


2.445 


.0201 


.1462 


.0576 


149 


31 




179 


4 


2.396 


.0226 


.1486 


.0340 


166 


32 





198 


35 


2.346 


.0255 


.1496 


+.0039 


188 


32 


- 


220 


6 


2.296 


.0286 


.1489 


-.0275 


210 


32 


+ 


242 


7 


2.245 


.0320 


.1464 


.0622 


236 


31 


1 


268 


8 


2.193 


.0360 


.1423 


.0983 


265 


30 


1 


296 


9 


2.142 


.0402 


.1393 


.1399 


296 


29 


1 


326 


40 


2.089 


.0450 


.1281 


.1864 


331 


27 


2 


260 


1 


2.036 


.0502 


.1170 


.2355 


369 


25 


2 


396 


2 


1.982 


.0559 


.1030 


.2875 


411 


22 


3 


436 


3 


1.928 


.0621 


.0859 


.3412 


452 


18 


3 


478 


4 


1.873 


.0690 


.0656 


.3965 


508 


14 


4 


526 


45 


1.822 


.0757 


.0442 


.4474 


557 


9 


4 


570 


6 


1.762 


.0845 


-.0156 


.5060 


622 


+ 3 


5 


630 


7 


1.704 


.0934 


+ .0154 


.5596 


687 


- 3 


6 


690 


8 


1.647 


.1028 


.0487 


.6082 


758 


10 


6 


754 


9 


1.589 


.1129 


.0853 


.6419 


832 


18 


6 


820 


50 


1.529 


.1239 


.1255 


.6893 


913 


27 


7 


893 


1 


1.471 


.1352 


.1599 


.7132 


994 


34 


7 


967 


2 


1.409 


.1479 


.2114 


.7349 


1,089 


45 


7 


1,051 


3 


1.348 


.1609 


.2565 


.7430 


1,185 


54 


7 


1,138 


4 


1.286 


.1745 


.3022 


.7307 


1,288 


63 


7 


1,231 


55 


1.224 


.1886 


.3467 


.7062 


1,391 


74 


7 


1,324 


6 


1.160 


.2035 


.3907 


.6642 


1,501 


83 


7 


1,425 


7 


1.095 


.2190 


.4320 


.6037 


1,612 


92 


6 


1,526 


8 


1.030 


.2347 


.4688 


.5257 


1,730 


99 


5 


1,636 


9 


0.963 


.2509 


.5008 


.4180 


1,847 


106 


4 


1,745 


60 


0.896 


.2672 


.5257 


.2911 


1,965 


112 


3 


1,856 


1 


0.828 


.2832 


.5426 


.1831 


2,083 


115 


2 


1,970 


2 


0.758 


.2994 


.5489 


-.0350 


2,201 


116 


+ 


2,085 


3 


0.689 


.3146 


.5474 


+ .1187 


2,318 


116 


- 1 


2,201 


4 


0.617 


.3298 


.5329 


.2839 


2,428 


113 


3 


2,312 
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(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


(8) 


(9) 


Ag©» 


z 


^o(z) 


<f>3(^) 


<PiU) 


*o(3) 


ksW 


*4(5) 


Filz) 


65 


0.543 


.3443 


.5056 


.4537 


2,532 


107 


5 


2,420 


6 


0.470 


.3572 


.4666 


.6156 


2,627 


99 


6 


2,522 


7 


0.396 


.3689 


.4152 


.7686 


2,716 


88 


8 


2,620 


8 


0.319 


.3792 


.3505 


.9098 


2,789 


74 


9 


2,706 


9 


0.243 


.3873 


.2768 


1.0262 


2,848 


59 


10 


2,779 


70 


0.164 


.3937 


.1918 


1.1176 


2,900 


41 


11 


2,848 


1 


.084 


.3975 


.0999 


1.1757 


2,929 


21 


12 


2,896 


2 


+0.000 


.3989 


+.0119 


1.1968 


2,937 


- 3 


12 


2,922 


3 


-0.080 


.3976 


-.0952 


1.1777 


2,929 


+ 20 


12 


2,937 


4 


0.164 


.3937 


.1918 


1.1176 


2,900 


41 


11 


2,930 


75 


0.249 


.3868 


.2829 


1.0180 


2,848 


60 


10 


2,898 


6 


0.348 


.3755 


.3762 


.8592 


2,767 


80 


9 


2,848 


7 


0.425 


.3645 


.4368 


.7043 


2,686 


93 


7 


2,772 


8 


0.516 


.3493 


.4912 


.5146 


2,569 


104 


5 


2,668 


9 


0.608 


.3316 


.5303 


.3069 


2,444 


112 


3 


2,553 


80 


0.702 


.3118 


.5502 


+.0892 


2,296 


117 


- 1 


2,412 


1 


0.798 


.2902 


.5473 


-.1204 


2,134 


116 


+ 1 


2,251 


2 


0.896 


.2672 


.5257 


.3130 


1,965 


112 


3 


2,080 


3 


0.996 


.2436 


.4859 


.4380 


1,788 


103 


4 


1,895 


4 


1.098 


.2185 


.4302 


.5899 


1,612 


91 


9 


1,709 


85 


1.203 


.1934 


.3614 


.6943 


1,420 


77 


7 


1,504 


6 


1.309 


.1694 


.2854 


.7358 


1,244 


60 


7 


1,311 


7 


1.418 


.1460 


.2048 


.7340 


1,075 


43 


7 


1,122 


8 


1.529 


.1240 


.1255 


.6893 


913 


27 


7 


947 


9 


1.644 


.1034 


-.0505 


.6106 


758 


+ 11 


6 


775 


90 


1.762 


.0845 


+.0156 


.5060 


622 


- 3 


5 


624 


1 


1.SS2 


.0679 


.0693 


.3874 


500 


15 


4 


489 


2 


2.004 


.0536 


.1090 


.2663 


394 


23 


3 


374 


3 


2.132 


.0397 


.1380 


.1485 


292 


29 


1 


264 


4 


2.260 


.0310 


.1478 


-.0483 


228 


31 


+ 


197 


95 


2.393 


.0227 


.1477 


+ .0325 


167 


31 


- 


135 


6 


2.530 


.0163 


.1399 


.0905 


120 


30 


1 


89 


7 


2.673 


.0100 


.1207 


.1295 


76 


26 


1 


49 


8 


2.821 


.0074 


.1044 


.1386 


54 


22 


1 


29 


9 


2.968 


.0050 


.0842 


.1353 


37 


18 


1 


18 


100 


-3.124 


.0028 


.0640 


.1203 


21 


14 


1 


6 



Since the original observations of d^^ are given in 5-year age 
intervals it becomes necessary to sum the numerical values of 
ipo{z) and its derivatives by quinquennial age groupings so as to 
form the required observation equations. We find thus for 
instance in the age interval 55-59 the following observation 
equation (the summation to take place from x = 55 to x = 59) 

koSipoiz) +k3'^(P3{z) +ki^(Pi{z) ='^(p{z) =0^, or 
1.0967/co+2.1390A;3 -2.9178/c4 = 7722 
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Similar equations are formed for the other age intervals, re- 
sulting in the following tabular representation of the coefficients 
to the various k's and the observed values of the frequency dis- 
tribution, (p{z) 

TABLE I 

Tabular Abbangements of Numerical Data in the Obseevation 

Equations 

Ages Vo >P3 

15-19 .0130 - .2930 

20-24 .0256 - .4305 

25-29 .0488 - .5833 

30-34 .0903 - .7104 

35-39 .1623 - .7265 

40-44 .2822 - .4996 

45-49 .4673 + .0896 

50-54 .7424 +1.0555 

55-59 1.0967 +2.1390 

60-64 1.4942 +2.6975 

65-69 1.8369 +2.0147 

70-74 1.9814 + .0166 

75-79 1.8077 -2.1174 

80-84 1.3307 -2.5393 

85-89 .7362 -1.0276 

90-94 .2729 .4571 

95-99 .0609 .5714 

100- .0068 .1714 

From the above table we notice that we have 18 observation 
equations from which to determine the three unknown para- 
meters ko, ks and k4. The number of equations being greater 
than the number of unknowns we make use of the method of 
least squares. While a direct application of this principle of 
course is feasible, it will, however, be found easier to start with 
an approximate solution for h, ki and ^4 and then apply the 
method of least squares. It will be found that in the three age 
intervals 65-69, 70-74 and 75-79 where the observations are most 
numerous the observations will be approximately satisfied by 
the following preliminary values of k, viz. : 

A;io = 7300, kh= -340 and kh= -50. 

Multiplying the above values of k with their respective 
columns in Table I, or in other words forming the products 
itVo, kh<Pi and ¥i<pi, we obtain a new table of the following form:* 

*Last figure omitted. 



'Pi 





+ .5730 


120 


+ .5758 


230 


+ .6508 


440 


+ .3694 


790 


- .3240 


1,370 


-1.4471 


2,270 


-2.7631 


3,570 


-3.6111 


5,400 


-2.9178 


7,722 


- .1066 


10,383 


+3.7739 


12,987 


+5.7854 


14,535 


+3.6030 


13,807 


-1.3721 


10,328 


-3.4640 


5,464 


-1.3925 


1,757 


.6125 


278 


.3890 


16 
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TABLE 


II 




Control 
Column 


Ages 


a 


b 


C 





a 


15- 19 


10 


10 


- 3 


- 12 


5 


20- 24 


19 


15 


- 3 


- 23 


8. 


25- 29 


36 


20 


- 3 


- 44 


9 


30- 34 


66 


24 


- 2 


- 79 


9 


35- 39 


119 


25 


2 


- 137 


9 


40- 44 


206 


17 


7 


- 227 


3 


45- 49 


343 


- 3 


14 


- 357 


- 3 


50- 54 


542 


-36 


18 


- 540 


-16 


55- 59 


801 


-73 


15 


- 772 


-29 


60- 64 


1,091 


-92 


1 


-1038 


-38 


65- 69 


1,341 


-69 


-19 


-1299 


-46 


70- 74 


1,446 


- 1 


-29 


-1454 


-38 


75- 70 


1,320 


72 


-18 


-1381 


- 7 


80- 84 


971 


86 


7 


-1033 


31 


85- 90 


537 


35 


17 


- 546 


43 


90- 94 


202 


-16 


7 


- 176 


17 


95- 99 


45 


-20 


- 3 


- 28 


- 6 


100-104 


5 


- 6 


- 2 


2 


- 5 



9,100 -12 6 -9148 -54 

The formation of the sum-products [aa], [ab], [ao], 

[bb], [be], [bo] and [cc], [co] proceeds now in routine manner as 
shown in the following tables: 



[aa] \ab] [ac] [ao] [as] 

100 100 - 30 - 120 50 

361 285 - 57 - 437 152 

1,296 720 - 108 - 1,584 324 

4,356 1,584 - 132 - 5,214 594 

14,161 2,975 238 - 16,303 1,071 

42,436 3,502 1,442 - 46,762 618 

117,649 - 1,029 4,802 - 122,451 - 1,029 

293,764 - 19,512 9,756 - 292,680 - 8,672 

641,601 - 58,473 12,015 - 618,372 - 23,229 

1,190,281 -100,372 1,091 -1,132,458 - 41,458 

1,798,281 - 92,529 -25,479 -1,741,959 - 61,686 

2,090,916 - 1,446 -41,934 -2,102,484 - 54,948 

1,742,400 95,040 -23,760 -1,822,920 - 9,240 

942,841 83,506 6,797 -1,003,043 30,101 

288,369 18,795 9,129 - 293,202 23,091 

40,804 - 3,232 1,414 - 35,552 3,434 

2,025 - 900 - 135 - 1,260 - 270 

25 - 30 - 10 - 10 - 25 

9,211,666 - 71,016 -44,961 -9,236,811 -141,122 
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TABLE IV 



[bbJ 




|6cl 




[bo] 




lbs] 


100 




- 30 




- 120 




50 


225 




- 45 




- 345 




120 


400 




- 60 




- 880 




180 


576 




- 48 




- 1,896 




216 


625 




50 




- 3,425 




225 


289 




119 




- 3,859 




51 


9 




- 42 




1,071 




9 


1,296 




- 648 




19,440 




576 


5,329 




-1,095 




56,356 




2,117 


8,464 




- 92 




95,496 




3,496 


4,761 




1,311 




89,631 




3,174 


1 




29 




1,454 




38 


5,184 




-1,296 




-99,432 




-504 


7,396 




602 




-88,838 




2,666 


1,225 




595 




-19,110 




1,505 


256 




- 112 




2,816 




- 272 


400 




60 




560 




120 


36 




12 




12 




30 


36,572 


Ice] 

9 
9 
9 

4 

4 

49 

196 

324 

225 

1 

361 

841 

324 

49 

289 

49 

9 

4 


- 690 


TABLE V 

\co] 

36 

69 

132 

158 

- 274 

- 1,589 

- 4,998 

- 9,720 
-11,580 

- 1,038 
24,681 
42,166 
24,858 

- 7,231 

- 9,282 

- 1,232 

14 
4 


48,931 


Us] 

- 15 

- 24 

- 27 

- 18 
18 
21 

- 42 

- 288 

- 435 

- 38 
874 

1,102 
126 
217 
731 
119 
18 
10 


13,797 



2,756 45,241 2,349 

From the above tables we may now write down the following 
scheme for the solution of the normal equations by means of 
the Gaussian algorithmus 
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TABLE VI 
Scheme for the Solution of Normal Equations 

921,167 - 7102 - 4496 - 923681 

55 35 7122 

3657 - 69 4893 

22 4508 

276 4525 



-.00771 



-.00488 



-1.00273 



3602 



- 104 


- 2229 


3 


64 


254 


16 



-.02887 



251 



48 



Solving for the unknowns we have now 



r4= 48:251 = .19123 

n= .61882 -(.19123) (-.02887): 



.62434 



n = 1.00273 - (.62434) ( - .00771) - (.19123) ( - .00488) 
= 1.00847 



We therefore find the following numerical values for the 
probable values of ko, ks and ki 



U = r,k\ = (1.00847) (7300) = 7,361.8 
/c3 = r3A;i3 = (.62484) (-340) = -212.2 
/c4 = r4A;i4 = (.19123) (- 50) = - 9.6 

The next step is then to form the three columns kiopoiz), 
ksipaiz) and knpi{z) for the individual ages from 15 and upwards. 
The formation of '^k^ipiiz) gives us finally the separate values 
by integral ages of the first component curve or Fi{x). 

If we now subtract this component from the originally ob- 
served values of the compound curve, dj-, we obtain the follow- 
ing values (arranged in quinquennial age groups) for the second 
component, Fuix). 
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Ages 


Fuix) 


0- 4 


50] 


5- 9 


440 > Hypothetical Values 
1,210 J 


10-14 


15-19 


1,650 


20-24 


1,719 


25-29 


1,610 


30-34 


1,309 


35-39 


989 


40-44 


715 


45-59 


473 


50-54 


247 


55-59 


67 



10,479 

It is of course possible to fit this particular curve type directly 
to a logarithmically transformed Gram or Charlier A type of 
frequency curves, although this will require 5 or 6 terms of the 
series. But still greater obstacles would be encountered if we 
were to attempt a graduation by means of Pearsonian curve 
types. The goal can, however, be reached quite readily by the 
introduction of a certain hypothetical device. Any reader 
familiar with the various types of frequency curves will readily 
notice that the above frequency distribution of Fii{x) repre- 
sents a truncated curve from which the curve segment corres- 
ponding to ages below 15 has been eliminated. We may there- 
fore substitute the following, so far hypothetical values for the 
missing curve segment. For ages 0-4 the value of 50, for ages 
5-9 the value of 440, and for the ages 10-14 the value of 1,210. 
The thus reconstructed histograph (shown in the above table) 
may now be fitted to a logarithmically transformed Gram or 
Charlier curve in the usual routine manner. The computations 
of the relative moments m^ result in the following values, for 
a provisional origin at the age of 17 and a unit interval of 5 years 

mi = 1.8380, m2 = 8.6342, ma =39.8630 

From which we find 

Xi = 1.838, X2 = 5.2560, X3 =4.5824. (Mean at age 26.19.) 

The equation 

;^3>7'-3V>7' = V 
then becomes 

4.582>7' - 82.876>7' = 145.200, 

the real root of which is 5^ = 19.0 (on basis of a 5 year unit.) 
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We furthermore find that n = 0.120 and m = 2.9227 +Zofif,5 = 
4.5321, which finally brings about the transformation of the 
variate x by means of the formula 

z = [log,(x+68.8)-4.532]:0.12 

where x is expressed in unit intervals of 1 year. 

The further determination of the coefl^icients ko, h and h by 
means of the method of least squares results in the values: 

A;o = 947.4, A;3=-63.4 and A;4=-30.0. Multiplying these 
values with their respective values oi<fo{z), <p3{z) and (Pi{z) and 
forming the corresponding sums we finally obtain the second 
component curve. 
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FIGXJRE3 

Diagram showing graduation of Ax column in the A M{^' ) hihle by a compound frequency curve of the 

Gram-Cliai'licT types. 

The sum of Fi{x) and Fuix) as shown on page 255 and also 
in the figure gives us the final compound frequency curve or 
the dj; curve, from which it now is a simple matter to form the 
l^ or -dj: column and its co-ordinated column of q^. 



Graduation of American Male Mortality Table (AM (5)) by means 
OP a Compound Frequency Curve 



Age 


Fl(x) 


Frr(x) 


dx 


h 


lOOfk;^ 


15 


21 


302 


323 


100532 


3.21 


6 


26 


319 


.345 


100209 


3.44 


7 


30 


332 


362 


99864 


3.62 
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Age 


FHx) 


FiHx) dz 


K 


10005I 


8 


35 


342 377 


99502 


3.79 


9 


39 


350 389 


99125 


3.92 


20 


44 


354 398 


98736 


4.03 


1 


48 


354 402 


98338 


4.11 


2 


57 


352 409 


97936 


4.18 


3 


60 


349 409 


97527 


4.19 


4 


68 


343 411 


97118 


4.23 


25 


75 


336 411 


96707 


4.25 


16 


86 


327 412 


96296 


4.28 


7 


95 


317 413 


95884 


4.31 


8 


106 


308 414 


95471 


4.33 


9 


117 


297 414 


95057 


4.36 


30 


130 


285 415 


94643 


4.38 


1 


144 


275 419 


94228 


4.45 


2 


160 


261 421 


93809 


4.49 


3 


179 


249 428 


93388 


4.58 


4 


198 


238 436 


92960 


4.69 


35 


220 


226 446 


92524 


4.82 


6 


242 


213 455 


92078 


4.94 


7 


288 


201 469 


91633 


5.12 


8 


296 


188 484 


91154 


5.31 


9 


32.3 


175 501 


90670 


5..53 


40 


360 


164 524 


90169 


5.81 


1 


396 


152 548 


89645 


6.11 


2 


436 


141 577 


89097 


6.47 


3 


478 


128 606 


8S520 


6.80 


4 


526 


117 643 


87914 


7.32 


45 


570 


107 677 


87271 


7.70 


6 


630 


96 726 


86594 


8.39 


7 


690 


87 777 


85868 


9.05 


8 


754 


78 832 


85091 


9.78 


9 


820 


69 889 


84259 


10.55 


50 


893 


61 954 


83370 


11.44 


1 


967 


53 1020 


82416 


12.37 


2 


1051 


47 1098 


81396 


13.49 


3 


1138 


41 1179 


80298 


14.68 


4 


1231 


36 1267 


79119 


16.01 


55 


1324 


30 1354 


77852 


17.39 


6 


1425 


25 1450 


76498 


18.95 


7 


1526 


52 1548 


75048 


20.62 


8 


1636 


18 1654 


73500 


22.41 


9 


1745 


15 1760 


71846 


24..54 


60 


1856 


12 1868 


70086 


26.65 


1 


1970 


11 1981 


68218 


29.04 


2 


2085 


9 2092 


66237 


31.59 


3 


2201 


7 2208 


64145 


34.42 


4 


2312 


6 2318 


61937 


37.41 


65 


2420 


5 2425 


59619 


40.67 
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Age 


Fl{x) 


FlUx) 


d^ 


Ix 


lOOOaa- 


66 


2522 


4 


2526 


57194 


43.62 


7 


2620 


3 


2623 


54668 


47.97 


8 


2706 


2 


2708 


52045 


52.02 


9 


2779 


2 


2781 


49337 


56.37 


70 


2848 


1 


2849 


46556 


61.19 


1 


2896 





2896 


43707 


66.25 


2 


2922 





2922 


40811 


71.60 


3 


2937 





2937 


37889 


77.51 


4 


2930 





2930 


(34952 


83.93 


75 


2898 





2898 


32022 


90.51 


6 


2848 





2848 


29124 


97.80 


7 


2772 





2772 


26276 


105.48 


8 


2662 





2662 


23504 


113.53 


9 


2553 





2553 


20836 


122.50 


80 


2412 





2412 


18283 


131.95 


1 


2251 





2251 


15871 


141.84 


2 


2080 





2080 


13620 


152.76 


3 


1895 





1895 


11540 


164.21 


4 


1709 





1709 


9645 


177.19 


85 


1504 





1504 


7936 


189.52 


6 


1311 





1311 


6432 


203.82 


7 


1125 





1125 


6121 


219.68 


8 


947 





947 


3996 


236.99 


9 


775 





775 


3049 


254.18 


90 


624 





624 


2274 


274.41 


1 


489 





489 


1650 


296.36 


2 


374 





374 


1161 


322.14 


3 


264 





264 


787 


335.45 


4 


197 





197 


523 


376.67 


95 


135 





135 


326 


414.11 


6 


89 





89 


191 


465.97 


7 


49 





49 


102 


480.39 


8 


29 





29 


53 


547.16 


9 


18 





18 


24 


780.00 


100 


6 





6 


6 


1000.00 



It will be of interest to compare these latter values with the 
original values of q^ as derived by Mr. Henderson's graduation. 
Such a comparison is shown in the appended table for quin- 
quennial ages. 

Ages Henderson's qx Fisher's a« 

15 3.46 3.21 

20 3.92 4.03 

25 4.31 4.25 

30 4.46 4.38 

35 4.78 4.82 
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A^es 


Hendoaui's «• 


Fi^«r'<»i 


40 


O.S4 


5.S1 


45 


7.94 


7.76 


50 


11.5S 


11.44 


o5 , . 


17.47 


17.39 


60 


i?6.t5S 


26.65 


Co 


40-66 


40.67 


TO 


61.47 


61.19 


75 


91.94 


9a51 


SO 


135.74 


131.95 


S3 


197.07 


1S9.02 


90 


2S0.35 


274.41 


95 


3S7.711 


414.11 


100 


c«2.50 


1000.00 



I think that every unbiased critic will admit that there exists 
a satisfactory agreement between the two tables in spite of the 
fact that we have wwked throughout with ba^c data in o-year 
age groups. McM•eo^"e^. the actual arithmetical work in the 
case of the gradoation by means of cami>ouaded Gram or 
Charlier curves is much simpler than the mual methods of 
graduation by Makeham's formula and mechanical intopola- 
tion formulas as employed by Mr. Hendoson.* Another i>oint 
speaking in favor of the frequaacy curve graduation is that our 
resulting fimctions are continuous functions for which standard 
tables of definite integrals have been prepared. It is thCTefcae 
possible to use the elegant and continuoi^ method <HiginaIly 
introduced by Mr. "Woolhouse in the computation of jtremiunis 
and policy values. Unf ort\inately this is not the place to treat 
this interesting phase erf the question, althou^ we may in i>ass- 
ing it mention that a graduation of the kind as here presented 
in practical computations of poBey values and premiums is 
even easier to work with than the renowned graduation formula 
by Makeham. especially in the ca^ of life contingendes involv- 
ing 2 or more hves. 



130. Additional Examples. — ^As another illustration I pre- 
sent the following frequency distribution (arranged in groups <rf 
3-year interva]s> of the ages of a group of 19.274 mafe em- 
ployees of the Bell System of the Amakan Telephone and 

*I do not Tish to imply tliese remaiks a$ » criticism of tbe able graduatkm by 
Heade^^oll, bow^T«r, 
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Telegraph Company, which most kindly has been furnished 
to me through the courtesy of this company. 

Age Distribution of Male Employees in the Bell System 



Ages 


X 


F(x) 


Ages 


X 


FCx) 


13-15 





1 


46-48 


11 


380 


16-18 


1 


9 


49-51 


12 


272 


19-21 


2 


745 


52-54 


13 


186 


22-24 


3 


2,264 


55-57 


14 


141 


25-27 


4 


3,828 


58-60 


15 


110 


28-30 


5 


3,801 


61-63 


16 


72 


31-33 


6 


2,711 


64-66 


17 


43 


34-36 


7 


1,918 


67-69 


IS 


17 


37-39 


8 


1,339 


, 70-72 


19 


14 


40-^2 


9 


884 


73-75 


20 


3 


43-45 


10 


533 


76-78 


21 


2 



Choosing the provisional lower limit at age 14 we find the 
following values for the crude moments or power sums s. 

So = 19,274, si = 112,363, Sa = 794,771, S3 = 6,790,761 

The values of the semi-invariants are 

;ii = 5.830, ;i2 = 7.2478, ;i3=27.4191. 

The resulting cubic expansion is therefore 

27.419)7' - 157.592>72 =880.731 

for which the solution is >7 = 6.1185. 

We have furthermore 

6.1185 = 6""+ ^•^"' 

7.2478 = e''"'-""(e"'-l), or 

?i2 = 0.1768, w = 0.4205 and m = 1.5462 

On the basis of an interval of one year we have therefore : 

z = [log(x) - 13.1) -2.645] :0.421* 

as the value of the variate in the generating function <po{z). 

We have m = 1.5462 + log 3 = 2.645. 
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The values of ko, ka and ^4 as determined by the method of 
least squares are Ao = 3064.4, fcs =45.1, A;4 = 80.5, on basis of one 
year interval. 

A comparison between the calculated and observed values 
(the latter being shown by single ages) is given in the attached 
diagram, which evidently is satisfactory for all practical pur- 
poses. I wish here to mention that an attempt by some of the 




FIGURE -l 

Diagram sho'wiDg comparison between observed and theoretical frequency distribution of active 

group of male employees of the Bell System. 



statistical assistants of the A. T. & T. Co. to fit the above data 
by means of the Pearsonian curves proved futile. Personally 
I have not as yet made an attempt to verify this negative 
result. 

As a final illustration we quote from J0rgensen's monograph 
an application of the logarithmic transformaion of the pre- 
viously discussed observations on the number of petal flowers 
in Banunadus Btdbosus. Since the variate in this instance is 
integral, the observations themselves clearly indicate that there 
must be a lower limit, or biological zero so to speak, at 4 petal 
flowers. 

The crude moments are then 

5o = 222, 5i = 862, ^2 = 794, from which we obtain 
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m = 


= 0.0440 










n = 


: 0.5445 










ko = 


183.2, 






SO 


that the formula reads 










F(x) = 


183.2 
.5445 V"2 


1 riog ( 


x—i)— 


.044- 




IT 


0.5445 





The detailed calculations according to this formula are shown 
below: 



(1) 


(2) 


(3) 


(4) 


(5) 


log nat (x-i) 


log U— 4)-m 


(2):n 


^oW 


F(z) 


.0000 


- .0440 


- .0810 


.3989 


131.8 


.6932 


+ .6492 


+ 1.1926 


.1965 


66.1 


1.0986 


1.0546 


1.9373 


.0612 


20.6 


1.3863 


1.3423 


2.4658 


.0191 


6.4 


1.6094 


1.5654 


2.8756 


.0064 


2.2 


1.7918 


1.7478 


3.2107 


.0024 


0.8 



A closer fit could of course be had by adding additional terms 
to the series, but even with one term the agreement between 
calculated and observed values is quite satisfactory for all 
practical purposes. 



CHAPTER XVII 
Frequency Curves and Their Relation to the Bernoullian Series 

131. The Bernoullian Series. — In Chapter IX it was 
shown that the general term 



<pix) 



-0'^ 



in the point binomial {p+q)% where p is the a priori probability 
for the happening of an event B in a single trial, represents 
the probability that E will happen x times and the comple- 
mentary event, E, s—x times. We also found that the maxi- 
mum term in the Bernoullian expansion of the point binomial 
could be written as : 

1 

V 2Trspq. 
when s is a large number. 

We wish now also to find a more simple expression for the 
general term, <p(x), instead of the laborious expression involving 
factorials of high order. 

It is evident that ip{x) represents a frequency function of an 
integral variate x which can assume all positive integral values 
from to s, and which satisfies the property of all frequency 
curves that 

We may therefore write <p{x) in the form of a Gram-Charlier 
frequency series as 

(p(x) =2cif,(a;) for t = 0, 3, 4, 5 . . . 

This involves the computation of the semi-invariants \(x) 

for r = 0,3, 4, 5 

By the definition of the semi-invariants we have : 
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262 FREQUENCY CURVES AND BERNOULLIAN SERIES [131 
where a; = 0, 1, 2, 3 s and 1if(x) = ip+qy = 1, or 

soe"- '•' " " '=<pi0) + <pa)e''+<f{2)e^"+. . .+f{s)e"' 

which for w = reduces to so = (p+qy = 1. 

Taking the logarithm on both sides of the above equation 
we have 

., , /llW ^201^ /l.ICo' , / ,, N 

/(co) =y, +-2!' + "3r+ • • • =« ^°S (pe"+q) 
Now it is easily seen that 

£»i,/(c<))=(^r_^-^y or Dif(w) (pe'^+q)=spe" 

from which we find 

Dlfi<^)q=pe'^[s-Dlf(c^)] 

Dlfico)q=pe"[s-Dlf{c) -Dlf{a:)] 

Dlf{o^)q = pe^'[s-Dlfic^) -2 DlJ{c^) -Dlf{w)] 

DtKoo)q=pe"[s-Dlf{o^) -3 Dlf{c,) -3 DIJ{o:) -Dtf{c^)l 



where 



d I Aoj A2C0 A,,a)" 



«:/(») =-l-+^+~V. 



Letting oj =0 we have therefore successively 

'kiq = pis-'Ki) 

\2q = p{s-\i-'K2) 

'hq = p{s -11-212-1}) 

Xtq = p{s -11-312-5X3 — 14) 
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or 

■Kz=spq{q-p) 
X4=sy>g(l — e pq) 



The generating function <po{x) in the Gram-Charlier series 
may hence be written as 

crV27r V27rspg 

while the coefficients Ca and a of the third and fourth derivatives 
of <fo{x) according to the formulas from paragraphs 113 and 114 
take on the form 

(-1)3 _ 

C3= — ^^spq{q-p) = -{^ spq) ^{q-p)\Z\ 

(7 o i 

(-1)4 

C4= — ^s'pq{\-&pq)={spq) Hl-6pg):4! 

which serve as measures for skewness and excess. 

Since p and q are proper fractions whose product never can 
exceed }/i it is readily seen that for large values of s both Cs and 
Ci will become very small quantities unless either p or q should 
be so small that the product sp (or sq) itself would be small 
even when s is a large number, a case which we presently shall 
discuss in detail. 

Apart from this exception the expression for the frequency 
function 

Fix)=(Poix)+C3'P3ix)+Ci'Pi{x)+ . . . 

approaches therefore the normal probability curve of Laplace 
whenever s is a large number. 

When, on the other hand, s in the point binomial (p+qY is 
not a large number both Cs and d play an important role as the 
necessary correction factors. (For p = q = }4 all the semi- 
invariants of uneven order vanish.) 
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The normal form of the point binomial, or ^o(x),was already- 
established by Laplace who also worked out a more accurate 
expression for skew binomials, which expression can be shown 
to represent the two terms ipoix) and C3(fi{x) 

As an illustration of the above formulas we shall now try to 
express a few Bernoullian point binomials by means of a 
Gram-Charlier series. 

Let us for instance try to express (.05 +.95)^°° by a Gram- 
Charlier series. We have in this case the following values for 
the parameters 

Xi = 5.0, VI2 = a = 2.1795, C3 = -0.0688, C4 =0.00625 

The substitution of these values in the Gram-Charlier series 
results in the following relative frequency distribution : 



x 


F(x) 


X 


F{x) 





.0084 


8 


.0614 


1 


.0312 


9 


.0343 


2 


.0763 


10 


.0179 


3 


.1356 


11 


.0081 


4 


.1812 


12 


.0031 


5 


.1865 


13 


.0009 


6 


.1522 


14 


.0003 


7 


.1028 


15 


.0001 



A similar calculation in the case of the Bernoullian binoni..,! 
CO.l +0.9)1°'' gives 

;ii = 10, (T = 3, C3 = - .0445, C4 = 0.0021 

with the following distribution: 



X 


l'\x) 


X 


F{x) 





.0000 


12 


.0984 


1 


.0004 


13 


.0732 


2 


.0020 


14 


.0502 


3 


.0065 


15 


.0322 


4 


.0162 


16 


.0194 


5 


.0333 


17 


.0109 


6 


.0581 


18 


.0058 


7 


.0875 


19 


.0027 


8 


.1145 


20 


.0012 


9 


.1318 


21 


.0005 


10 


.1338 


22 


.0002 


11 


.1211 


23 


.0001 
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We shall presently have occasion to compare these distribu- 
tions with those obtained from a direct expansion of the point 
binomial. 

132. Poisson's Exponential. The Law of Small Num- 
bers. — In certain statistical series it frequently happens that 
the semi-invariants of higher order than zero all are equal, or 
that 

A^ = A2 ^ A3 ^ . . . . ^ Aj. ^ A, 

We shall for the present limit our discussion to homograde 
series where the variate is always positive and integral, and 
where therefore the definition of the semi-invariants is of the 
form: 

Xo) Xaj2 XcijS 

= f(0)e°"+<pa)e'"+fi2)e^"+<p{3)e^''+ . . . , or 

Xcu Xa)2. XojS 

e^' '^'- ^' =e^'^e'^«" = i;^(x)e^"for x = 0, 1,2,3 ... , 

which also can be written as 

1+-^+-^+ J =^(0)l+^(l)e"+^(2)e^"-t- 

The coefficient of e""" gives the relative frequency or the 
probability for the occurrence of x = r, and we henceforth find 
that 

f(a;)=^(r)= j- 

This is the famous Poisson Exponential, so called after the 
French mathematician, Poisson, who first derived this expres- 
sion in his Recherches sur la Probabilite des jugesments, but in 
an entirely different manner than the one we have indicated 
above. 

The Poisson Exponential opens now a way to the treatment 
of the point binomial in the exceptional cases where the product 
sp (or sq) is small even when s is a very large number, or when 
more strictly speaking the expression 

lim sp = ^ 
where a is a finite number. 
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Under such conditions p (or q) must approach zero and its 
complementary probability q (or p) must approach unity as 
their limiting values. 

The expressions for the semi-invariants as given in paragraph 
131, i. e. 

/li=sp 

/^2=spq 

^3 = spq{q-p) 

Xi = spq{l-6pq) 



will under these conditions all approach the limit sp, and the 
general term in the Bemoullian expansion of the point binomial 
can therefore be expressed by means of the Poisson exponen- 
tial. 

In all cases where the semi-invariants of various orders hap- 
pen to be equal, or very nearly equal, the formula by Poisson 
will be preferable in place of the more general expansion by the 
Gram-Charlier series. 

As an illustration we may select the simple binomial (.001+. 
.999) ""' where the semi-invariants have the following values: 

2,, =0.1, ;i2 = 0.0999, ;i3 = 0.099702, ;i4 = 0.0994006, 

and therefore may be considered as being nearly equal. 

The general term, <p{x), in this particular point binomial can 
therefore be written as a Poisson exponential of the form: 

^(x)=,^(r)=e-'''0.r:r! for r = 0, 1,2,3 .... 

The Russian statistician, Bortkewitsch, has given in his in- 
teresting and scholarly brochure Das Gesetz der kleinen Zahlen 
(1898) a four decimal place table of the Poisson exponential 
e~^ ?J:r\ for values of X. from 0.1 to 10.0. The English biometri- 
cian, Soper, in 1914 published a 6 decimal place table from 
/I = 0.1 to /^ = 15.0. This table is found in Pearson's well-known 
Tables for Biometricians. For the above mentioned Bernoul- 
han point binomial (0.001+0.999)^°°, corresponding to the 
Poisson exponential e~°^0.1'':r!, we find from Soper's table the 
following values of \j/ir). 
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r 


T|/(r) 





.904837 


1 


.090484 


2 


.004524 


3 


.000151 


4 


.000004 



While the exponential of Poisson requires theoretically at 
least, that the semi-invariants must all be of the same magni- 
tude, it will, however, often be found that this exponential will 
give a fair approximation to the true observed values of the 
frequency curve in cases where the semi-invariants ?.!, Xj, X3, 
^4 . . . do not differ greatly from each other. In this connec- 
tion it is of interest to compare the fits of Poisson's exponential 
and the Gram-Charlier series with the true values in the binom- 
ial expansion in the three examples we have given above. 
Through the courteous efforts of my translator and co-editor, 
Miss C. Dickson, the three point binomials (0.001 +0.999)1"°, 
(O.OS+O.gS)!"" and (0.10+0.90)1°" h^ve been expanded directly 
and the results as compared with the forms of Poisson and of 
Gram-Charlier are shown in the following tables: 

Values of ^(x) in Vahious Poixt Binomials 
TABLE I (0.001+0.999)100 





X 




Binomial 




Poisson 











.9048 




.9048 






1 




.0906 




.0905 






2 




.0045 




.0045 






3 




.0001 




.0002 






4 




.0000 




.0000 








TABLE II (0.05+0.95)100 






£ 




Binomial 




Gram-Charlier 




Poisson 







.0059 




.0084 




.0067 


1 




.0312 




.0312 




.0337 


2 




.0812 




.0763 




.0842 


3 




.1396 




.1356 




.1404 


4 




.1781 




.1812 




.1755 


5 




.1800 




.1865 




.1755 


6 




.1500 




.1522 




.1462 


7 




.1060 




.1028 




.1044 


8 




.0649 




.0614 




.0653 


9 




.0349 




.0343 




.0363 


10 




.0167 




.0179 




.0181 


11 




.0072 




.0081 




.0082 


12 




.0028 




.0031 




.0034 


13 




.0003 




.0009 




.0013 


14 




.0001 




.0001 




.0005 



268 FREQUENCY CURVES AND BERNOULLIAN SERIES [132 
TABLE III (0.1+0.9)100 



z 


Binomial 


Gram-Charlier 


Poisson 





.0001 


.0000 


.0000 


1 


.0003 


.0004 


.0005 


2 


.0016 


.0020 


.0023 


3 


.0059 


.0065 


.0076 


4 


.0159 


.0162 


.0189 


5 


.0339 


.0333 


.0378 


6 


.0596 


.0581 


.0630 


7 


.0889 


.0875 


.0901 


8 


.1148 


.1145 


.1125 


9 


.1304 


.1318 


.1251 


10 


.1319 


.1339 


.1251 


11 


.1199 


.1211 


.1137 


12 


.0988 


.0984 


.0948 


13 


.0743 


.0732 


.0729 


14 


.0513 


.0502 


.0521 


15 


.0327 


.0322 


.0347 


16 


.0193 


.0194 


.0217 


17 


.0106 


.0109 


.0128 


18 


.0054 


.0058 


.0071 


19 


.0026 


.0027 


.0037 


20 


.0012 


.0012 


.0019 


21 


.0005 


.0005 


.0009 


22 


.0002 


.0002 


.0004 


23 


.0000 


.0000 


.0002 



These tables need no further explanation and demonstrate 
the great graduating ability of the Posison function in the 
case of point binomials. For although the Gram-Charlier 
functions give better results in the last example, we must on 
the other hand not forget that we had to determine 4 para- 
meters, /li, a, Ca and d, while Poisson's exponential requires only 
the determination of the single parameter /I. The great draw- 
back of the Poisson exponential lies in the fact that it is a dis- 
crete function, which exists only for positive integral values of 
the variate. It therefore does not lend itself so readily to in- 
tegration as the Laplacean probability function. 

It is the great achievement of the eminent Swedish astrono- 
mer and statisitican, Charlier, to have been the first to attempt 
to find a continuous function possessing the same powerful and 
flexible characteristics as the Poisson exponential. Charlier 
has introduced as a generating function a certain curve type 
which is expressed by the following formula: 
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IT 



'/'W =^m.M = / e^^'^cos [nsino}-xu]do} 

o 

Using the above expression as a generating function Charlier 
has shown that any frequency curve can be expressed by the 
series 

F(x)=^(x)-^,A^(x)+|a^(x)-|a^i|,(x)+ .... 

At the time when Charlier introduced this function in the 
theory of frequency curves in a little pamphlet Weiteres uber das 
Fehlergesetz he gave only a rather short sketch of the method. 
He and the Danish actuary, J0rgensen, have, however, of late 
years further developed the method so as to make it useful in 
practical computations. J0rgensen has in this respect done some 
very neat work in supplying a method for determining the con- 
stants, A;, in the above series by means of semi-invariants 

We intend to treat these investigations in the forthcoming 
second volume of this treatise. In the meantime, however, 
students who encounter skew frequency distributions in their 
work will in nearly all cases be able to overcome the practical 
difficulties of a graduation by means of the logarithmic trans- 
formation of the variate, treated in the earlier chapters of this 
book. 



133. The Law of Small Numbers. — In the case of integral 
variates we wish, however, to call the reader's attention to cer- 
tain properties of the Poisson exponential, or probability func- 
tion, which have been apparently overlooked or misunderstood 
by several writers, especially among the English biometricians. 
Somehow or other the impression among those writers has been 
that the frequency curve or probability function of Poisson is 
invariably connected with the expansion of the point binomial. 
Thus for instance Mr. Yule in his well-known Introduction to 
the Theory of Statistics treats it as such, while Lucy Whitaker 
in an article entitled "On the Poisson Law of Small Numbers" 
in Biometrika for April, 1914, subjects the whole theorem to 
a scathing criticism. We quote the following sentence from 
Miss Whitaker's article: "It might be supposed, although 
erroneously, that the Poisson-Exponential formula was capable 
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of great accuracy in addition to its great simplicity. But this 
is to neglect the fundamental assumptions on which it is based, 
namely: 

(1) That the data actually correspond to a binomial 

(2) That in the binomial q is small and n large." 

It is true that Poisson in deriving his formula started from 
the Bernoullian point binomial so as to meet the cases where 
the normal probability curve of Laplace failed to give a close 
approximation, that is in the case of the limiting value when 
either p or q becomes very small, but s is large enough so that 
sp or sq remain finite; and it is probably this fact which has 
prompted the above remarks of Miss Whitaker. But this is 
really to put the cart before the horse. In the case of integral 
variates we can, as shown in the preceeding paragraphs, derive 
the Poisson probability function as a general form of frequency 
distributions whose semi-invariants are all equal, and it is only 
incidental that this property is possessed by the special binomial 
limit in the case mentioned above. But this property is not a 
general property of the binomial any more than it is the property 
of the same point binomial function to result in a Laplacean 
normal probability curve when the exponent s is large and 

Looked at wholly from the point of view of frequency func- 
tions it is, however, not necessary to resort to the binomial 
limit as a base for the derivation of Poisson's formula, which 
can be derived directly from the definition of the semi-invariants 
when those peculiar parameters are considered as being of 
equal magnitude irrespective of their order. Now the question 
presents itself whether the Poisson probability curve might 
not, like its fellow brother, the Laplacean probability curve, 
be used as a generating function in expanding certain types of 
frequency distribiitions in serial form. It is to this question 
that the discussion is devoted in the following chapter. 



CHAPTER X\l.ll 

POISSOX-CHART.TRR FREQUENCY C^E^■ES FOR INTEGRAL VARL4.TES 

134. Charlier"s B Curve. — ^^"e have already seen in the pre- 
vious chapters that the Gram-Charlier frequency cun-e could be 
written as 

FvJ)=:ic,vAJ-)=3:ciH>,r cc,r^ for i = 0, 1. 2, 3, ... . 

where ^- .r) is the generating Laplacean probability function. 
The idea now immediately suggest* itself to use a s imil ar 
method of expansion in the case of the Pois^on probability 
function and to employ this exponential as a generating func- 
tion in the same manner as the Laplacean function. We are, 
however, in the present case of the Poisson exponential dealing 
with a generating function which so far has been defined for 
positive integral values only and, therefore, represents a dis- 
crete function. For this reason it will be imi>ossible to express 
the series as the sum-products of the successive derivatives of 
the generating function and their conflated parameters c. We 
can, however, in the case of integral variates express the series 
by means of finite differences and write F{x) as follows: 

F,x'^=C(u.x')+eiSx!f{i)^CtA\!f(x' - .... (7) 

where \li\X^ =f~~"»i^: x! for x=0. 1, 2. 3 , and 

lu I =1. 

a-:.j:)=-;vx)-vu-i\ 

A> U) = Au a' -AsIU-l) =V vJ -2v V J-1) -V U-2 

The series (It is known as the Poisson -Charlier frequency 
series or Chai'lier s B type of frequency curves. 

The semi-invariants of these frequency series are given by the 
following relation: 

C ' 2! 3! =^[c:C{x''+Ci:iC,I:^CtS'4\X - ] e'^ 

x=o 

271 



272 POISSON-CHARLIER FREQUENCY CURVES [134 

Expanding and equating the co-efficients of equal powers 
of w we have: 

Xo = 1 = co2^(a;) or co = 1 

;ii = 2a; (Hx) +CrAHx) +C2AV(a;) + ) (//) 

V+;i2 = 2«2(^(a;)-t-CiAi//(a;)+C2A2i//(a;)+. . . .) 



We now have 

2i//(a;) = 1, and 

2xi/>(a;) =2me~'"TO''~^:(a;-l)! = m2»/>(a;— 1) =m. 

We also find from well-known formulas of the calculus of 
finite differences that* 

2a;n/>(a;)=m^-|-TO 
2xAi/»(x) = — 1 
2xA^(x)=0 
Xx^Axfi{x) = --(2m+l) 
Xx'A\li{x)=2 

Substituting these values in (7/) we obtain 

V+^2 = m'-t-m-(2m+l)ci+2c2 

By letting m = Xj we can make the coefficient Ci vanish, which 
results in 

C2 = }^l^2 — m] 

where the two semi-invariants Xi and /I2 are calculated around 
the natural zero of the number scale as origin. 

For the above discussion we have limited ourselves to the 
determination of the three constants m, Co and C2. It is easy, 
however, to find the higher parameters Cs, Ct, Cs, . . . . from 
the relations between the moments of the Poisson function and 
the semi-invariants of order 3, 4, 5, . . . etc. Charlier usually 

*These formulas can also be derived from the definition of the semi-invariants and 
the well-known relations between moments and semi-invariants as given on page 237, 
when we remember that according to our deflnition all semi-invariants in the Poisson 
exponential are equal to m. 
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calls the parameter m the modulus and Cz the eccentricity of the 
B curve. 

135. Numerical Examples. — As an illustration of the appli- 
cation of the Poisson-Charlier series we select the previously 
mentioned series of observations on alpha particles radiated from 
a bar of Polonium as determined by Rutherford and Geiger 
and shown on page 174 of this treatise. We are here dealing 
with integral variates which can assume positive values only 
and the observations are therefore eminently adaptable to the 
treatment by Poisson-Charlier curves. Selecting the natural 
zero as the origin of the co-ordinate system we find that the 
first two semi-invariants are of the form 

;ii =3.8754, ^2 =3.6257, and we therefore have: 

m = Xi=3.88; C2 = y2[^2-m] = -0.125 

The equation for the frequency distribution of the total N = 
2608 elements therefore becomes 

f(x)=M^3.88(x) + (-0.125)A^i//3.88(a;)]. 

The table below gives the values as fitted to the curve, F{x) : 

Alpha Particles Discharged from Film op Polonium 
(Rutherford and Geiger) 







iV=2608, TO=3.88, 


C2= -0.125 






(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


X 


'iHx) 


A2<Kl) 


NX(.2) 


iVX(3)XC2 


(4) + (5) 





.020668 


-I-.020668 


53.9 


- 6.7 


47 


1 


.080156 


-I-.038820 


209.0 


-12.7 


196 


2 


.155455 


-F.015811 


405.4 


- 5.2 


400 


3 


.201015 


-.029739 


524.2 


+ 9.7 


533 


4 


.194967 


-.051608 


508-5 


+ 16.8 


525 


6 


.151265 


-.037654 


394.5 


+ 12.3 


407 


6 


.097850 


-.009714 


254.9 


+ 3.2 


258 


7 


.054249 


+ .009814 


141.2 


- 3.2 


138 


8 


.026316 


+ .015668 


68.7 


- 5.1 


64 


9 


.011351 


+ .012968 


29.6 


- 4.2 


25 


10 


.004407 


+ .008021 


11.5 


- 2.6 


9 


11 


.001.555 


+.004092 


4.1 


- 1.2 


3 


12 


.000503 


+.001800 


1.3 


- 0.6 


1 


13 


.000150 


+ .000699 


0.4 


- 0.2 





14 


.000042 


+.000245 


0.1 


- 0.1 





15 


.000010 


+.000076 


0.0 


- 0.0 





16 


.000003 


+.000025 









17 


.000001 


+.000005 
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Bateman has in Philosophical Transactions (1902) given a 
theoretical frequency distribution of the above series of observa- 
tions wherein he develops the Poisson probability function, 
being ignorant of the previous demonstration by Poisson. In 
a later note he mentions that the formula was given by the 
French mathematician in his work on probabilities, published 
in 1837. 

Bateman's calculation includes, however, only the first term 
of the Poisson-Charlier series and is, therefore not so close as 
the above fit. 

As a second example we offer our old friend, the distribution 
of flower petals in Ranunculus Bulhosus. Selecting the zero 
point at X = 5 and computing the semi-invariants in the usual 
manner we obtain the following equation for the frequency 
curve. 

F{x) =222i/;(x)-F31.5aXx), m =0.631 

A comparison between calculated and observed values fol- 
lows below: 



X 


F{x) 


Obs. 


5 


134.9 


133 


6 


51.6 


55 


7 


22.5 


23 


8 


9.5 


7 


9 


2.9 


2 


10 


0.6 


2 



136. Transformation of the Variate. — For integral vari- 
ates we have shown that the Poisson frequency curve possesses 
the important property that all its semi-invariants are equal. 
Now while a frequency distribution of a certain integral variate, 
X, may perhaps not possess this property, it may, however, 
very well happen after a suitable linear transformation has been 
made, that the variate thus transformed will be subject to the 
laws of Poisson's function. 

Let z = ax — h represent the linear transformation which is 
subject to the above laws with a series of semi-invariants all 
equal to m. 

These semi-invariants according to the properties set forth 
in paragraph (104) are therefore 
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m = /li(z) =a'ki{x)—h 
m = 'ki{z) =a'/l3(x) 



and our problem is to find the unknown parameters a, b and m. 
Simple algebraic methods, which it will not be necessary to 
dwell upon, give the following results: 

a = A2;a3 

m = V '-^3^ 

h = a'ki—m 

As a numerical illustration of this transformation we choose 
from J0rgensen a series of observations by Davenport on the 
frequency distribution of glands in the right foreleg of 2000 
female swine. 

No. of Glands 0123456789 10 
Frequency . . 15 209 365 482 414 277 134 72 22 8 2 

The values of the three first semi-invariants are 

;ii =3.501, ^2 =2.825, ;i3 =2.417, 

a =2.825:2.417 = 1.168 

TO =2.825^:2.417^ =3.859 

6 = (1.168) (3.501) -3.859 =0.230. 

The new variable then becomes z = ax — h and the transformed 
Poisson probability function takes on the form: 



»/'(2)=- 



In general, however, we will find that z is not a whole nmnber 
and the expression z! therefore has no meaning from the point 
of view of factorials at least. This difficulty may, however, 
be overcome through the introduction of the well-known 
Gamma Function, r(z+l), which holds true for any positive 
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or negative real value of z and which in the case of integral 
values of z reduces to r(z+l) =z! 

Hence we can write the transformed Poisson probability 
function as 

Tables to 7 decimal places of the Gamma Function, or rather 
for the expression —log r(z+l), have been computed by J0r- 
gensen in his aforementioned book from 2=— 5toz = 15, pro- 
gressing by intervals of 0.01. 

By means of this table and the tables of ordinary logarithms 
it is now easy to find the values of ^{z) in the case of the example 
relating to the number of glands in female swine. The detailed 
computation is shown below.* 



(1) 


(2) 


(3) 


(4) 


(6) 


(6) 


7 


X 


2 — 


-iogr(2+i) 


log OT^ 


(3) + (4)+log^™ 


tA(^) 


F[x) 





-.230 


.9209 


.8651 


.1101-2 


.0129 


30.1 


1 


+.938 


.0108 


.5500 


.8849-2 


.0767 


179.2 


2 


2.106 


.6555 


.2350 


.2146-1 


.1639 


382.9 


3 


3.274 


.0679 


.9199 


.3119-1 


.2051 


479.1 


4 


4.442 


..3216 


.6048 


.2505-1 


.1780 


415.8 


5 


5.610 


.4547 


.2897 


.0685-1 


.1171 


273.6 


6 


6.778 


.4904 


.9746 


.7891-2 


.0615 


143.7 


7 


7.946 


.4446 


.6595 


.4282-2 


.0268 


62.6 


8 


9.114 


.3285 


.3444 


.9970-3 


.0099 


23.1 


9 


10.282 


.1506 


.0294 


.5041-3 


.0032 


7.5 


10 


11.4.50 


.9177 


.7143 


.9561-4 


.0009 


2.1 



137. The BernouUian Series Expressed by B Curves. 

In the case of the BernouUian point binomial we have /li = sp and 
^2 =s'pq. If we now wish to express the general term, v^(x), in 
the binomial by a.Poisson-Charlier series we evidently have 

f (x) =i//(a;)+C2A^i/'(a;) where 
i//(a;) =er™''m":x\ 

Now m = ^1 = sp, and c^ = }i{^2—'>n) = }^{spq—sp} = — }yisp^. 
As an illustration of this expansion we may again look at the 

*The characteristics of the logarithms have been omitted in this table (except In 
column 5) and only the positive mantissas are shown. Coliunn 7 represents the 2000 
individual observations pro rated according to column 6. 
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point binomial (0.1+0.9)i™, discussed on page (268). There 
we have: 

m = 10, C2 = — Yi, while 

if(x) =\|/io(x) -3/^A^i//io(x). 

The actual computation by means of this formula results in 
the following tabular representation. 



X 


Poisson-Charlier 


X 


Poisson-Charlier 





.0000 


12 


.0986 


1 


.0002 


13 


.0744 


2 


.0016 


14 


.0514 


3 


.0059 


15 


.0330 


4 


.0159 


16 


.0197 


5 


.0340 


17 


,0108 


6 


.0599 


18 


.0055 


7 


.0893 


19 


.0025 


8 


.1149 


20 


.0011 


9 


.1301 


21 


.0005 


10 


.1314 


22 


.0001 


11 


.1194 


23 


.0000 



A comparison of this series with the actual expansion of the 
point binomial by Miss Dickson on page (268) leaves little to 
be desired in the way of exactitude. The fit to the true binomial 
is even closer than that of Gram-Charlier series in spite of the 
fact that only two parameters enter into the determination of 
Poisson-Charlier's curves while four parameters are required 
for the Gram-Charlier curves. 



138. Remarks on Mr. Keynes' Criticisms. — From the above dis- 
cussion it is evident that the Bernoullian point binomial can always be 
represented without difficulty by either the Gram-Charlier or the Poisson — 
Charlier frequency curves. This point is of interest in connection with 
some wholly misleading and erroneous statements regarding Laplace's 
analysis of the Bernoulhan Theorem by Mr. J. M. Keynes, in his recently 
issued "Treatise on Probability. 

On page 358 Mr. Keynes points out the assymmetry in the Bernoullian 
expansion and claims that the want of symmetry is generally being over- 
looked, "and it is not uncommon to assume that the probability of a given 
divergence less than pm is equal to that of the same divergence in excess 
of pm, and, in general, that the probability of the frequency's exceeding 
pm in a set of m trials is equal to that of falhng short of pm." 

No real m.athematioian, and least of all Laplace, has ever claimed the 
presence of symmetry as being general in the case of the Berno.uUian 
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series. Those who have fallen into that error are economists and statisti- 
cians who like Mr. Keynes are ignorant of the true assumptions underlying 
Bernoulli's and Laplace's demonstrations. Every m.athematician knows 
for instance that the annual loss ratios for total permanent disability bene- 
fits in workmen's compensation assurance on the basis of a small sample 
payroll of say 100,000 Kroner can assume all possible real values from 
zero and upwards, and losses in excess of 100,000 are indeed possible. 
Statistics from certain Scandinavian industries show that the average 
annual loss ratio in respect to total permanent disability is about 1 Krone 
for each 100,000 Kroner of payroll exposure. We know, however, of a 
certain instance in the case of a small industrial establishment with a 
payroll of nearly 100,000 Kroner where the losses in a single year were 
more than 20,000 Kroner. In the great majority of cases, however, the 
annual losses are nil. We are, therefore, dealing with a decidedly skew 
Poisson — Charlier frequency curve. 

Mr. Keynes' example is in fact less striking. He considers the case of 
throwing aces in 60 successive throws with a die, and he remarks that the 
ace cannot appear less often than not at all, whereas it may well appear 
more than 20 times. This, of course, is self evident and realized by 
Laplace in his analysis, and there is no valid reason for dwelling at length 
on such simple matters. But the English scholar is evidently greatly 
impressed by these very simple considerations for he continues in a most 
charnaing and naive manner as follows : 

"The actual measurement of this want of symmetry and the determina- 
tion of the conditions, in which it can be safely neglected, involves labori- 
ous mathematics of which I am only acquainted with one direct investiga- 
tion, that published in the Proceedings of the London Mathematical Society 
by Mr. T. C. Simmons." 

How this charmipg and naive statement reminds one of the playful 
sophistries of a bright and impish, but not necessarily bad, small boy, 
trying to offer some excuse and explanation for his mischievous pranks, 
while he at the same time is wholly unmindful of the fact that his explana- 
tions and excuses are the most damning evidence of his own guilt. 

Here we have Mr. Keynes, a successful writer of economic subjects, posing 
as a critic of such intellectual giants in the realm of mathematical science 
as Bernoulli, Laplace and Poisson (which of course presupposes that he 
must have read very carefully the various writings of those old masters) ; 
who calmly and in the most innocent manner admits that of the "labori- 
ous" mathematics involved in this question he is only acquainted with a 
rather clumsy demonstration published in 1896. 

While I have not the slightest doubt as to the veracity of these facts 
so far as Mr. Keynes is concerned, it will be of interest to see what the 
actual historical facts are. Now in so far as the measurement of the 
assymmetry or skewness of the Bernoullian point binomial is concerned 
this was already performed by Laplace himself, an accomplishment which 
in itself creates a degree of doubt in the reader's mind as to whether Mr. 
Keynes really has studied Laplace with the necessary care required of one 
who poses as a critic of the great Frenchman. Ilarald Westergaard, the 



138] REMARKS ON MR. KEYNES' CRITICISM 279 

eminent Danish scholar, whose fame as a statistician surely rests on a far 
more secure foundation than that of Keynes takes in his Statisiikens Teori i 
Grundrids (Copenhagen, 1915), special pains to point out that Laplace 
was the first to give a mathematical measure of the slaewness in a Ber- 
noullian frequency distribution. 

The Danish actuary. Gram, long before the intellect of our recent 
English critic saw the light, derived his general series for frequency curves, 
which of course also applies to the Bernoullian case. Thiele in his Al- 
■mindelig lagttagelseslaere, at a time when the young Keynes probably 
was being piloted by his nursemaid or governess, discussed the same series 
from the point of view of semi-invariants. Later on Charlier continued 
in direct line from where Laplace and Poisson concluded their labours. 

The necessary corrections to the generating functions, whether these 
be Laplace's or Poisson's probability ciu-ves, as derived by these Scandi- 
navian writers are given in Chapters XVII and XVIII of this treatise. 
Not a single one of these demonstrations requires the "laborious" mathe- 
matics as mentioned by Mr. Keynes. The fact that our romancing 
English economist evidently is in blissful ignorance of the fundamental 
work by the Scandinavian school is, however, no excuse for his superfi- 
cial knowledge of the expansion of statistical series, since much of this 
work has appeared in English. 

Mr. Keynes' misconception of the real significance of the Law of Small 
Numbers and his criticism of Bortkewicz may possibly also be traced to 
his apparent ignorance of the work of the Scandinavian authors. His 
criticism moves practically along the same lines as that of the views held 
by Miss Whitaker and described on page 270 of this treatise. He, Hke 
Miss Whitaker, fails to realize that the generating Poisson probability 
function arises from the more general fact that all its semi-invariants are 
equal, rather than from the more special fact that one of the limiting values 
of the point binomial reduces to a Poisson frequency curve. The very 
fact that Mr. Keynes in his large voliime never mentions the semi-invari- 
ants leads one to inquire whether those important statistical parameters 
remain a closed book to him. * 



♦Similar immature views as those held by Keynes and Miss Whitaker are also ex- 
pressed by Mr. A, Mowbray in the May, 1920, Proceedings of the Casualty Actuarial 
Society of America (page 197). Judging from his tedious and laborious analysis Mow- 
bray evidently never has heard of the semi-iu variants. 
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z foW ^3(z) 9i(z) fiiz) ftiz) ffo(z)dz 



0.00 


.3989 


-hO.OOOO 


+ 1.1968 


-0.0000 


-5.9841 


.5000 


.05 


.3984 


.0597 


1.1894 


.2983 


5.9319 


.5199 


.10 


.3970 


.1187 


1.1671 


.5915 


5.7763 


.5398 


.15 


.3945 


.1762 


1.1304 


.8743 


5.5208 


.5596 


.20 


.3910 


.2315 


1.0799 


1.1420 


5.1711 


.5793 


.25 


.3867 


.2840 


1.0165 


1.3900 


4.7351 


.5987 


.30 


.3814 


.3330 


.9413 


1.6142 


4.2223 


.6179 


.35 


.3752 


.3779 


.8556 


1.8111 


3.6439 


.6368 


.40 


.3683 


.4184 


.7607 


1.9777 


3.0122 


.6554 


.45 


.3605 


.4539 


.6583 


2.1117 


2.3414 


.6736 


.50 


.3520 


.4841 


.5501 


2.2114 


1.6448 


.6915 


.55 


.3429 


.5088 


.4378 


2.2760 


.9371 


.7088 


.60 


.3332 


.5278 


.3231 


2.3052 


- .2324 


.7257 


.65 


.3230 


.5411 


.2078 


2.2995 


+ .4555 


.7422 


.70 


.3123 


.5486 


+ .0937 


2.2601 


1.1135 


.7580 


.75 


.3011 


.5505 


- .0176 


2.1888 


1.7298 


.7734 


.80 


.2897 


.5469 


.1247 


2.0880 


2.2938 


.7881 


.85 


.2780 


.5381 


.2260 


1.9604 


2.7964 


.8023 


.90 


.2661 


.5245 


.3203 


1.8095 


3.2304 


.8159 


0.95 


.2541 


.5062 


.4067 


1.6387 


3.5898 


.8289 


1.00 


.2420 


.4839 


.4839 


1.4518 


3.8715 


.8413 


1.05 


.2299 


.4580. 


.5516 


1.2529 


4.0735 


.8531 


1.10 


.2179 


.4290 


.6091 


1.0458 


4.1958 


.8643 


1.15 


.2059 


.3973 


.6561 


.8346 


4.2403 


.8749 


1.20 


.1942 


.3635 


.6925 


.6230 


4.2103 


.8849 


1.25 


.1826 


.3282 


.7185 


.4147 


4.1107 


.8944 


1.30 


.1714 


.2918 


.7341 


.2130 


3.9475 


.9032 


1.35 


.1604 


.2549 


.7399 


- .0209 


3.7278 


.9115 


1.40 


.1497 


.2180 


.7364 


+ .1590 


3.4595 


.9192 


1.45 


.1394 


.1815 


.7243 


.3244 


3.1510 


.9265 


1.50 


.1295 


.1457 


.7042 


.4735 


2.8109 


.9332 


1.55 


.1200 


.1111 


,6772 


.6051 


2.4481 


.9394 


1.60 


.1109 


.0780 


.6440 


.7181 


2.0712 


.9452 


1.65 


.1023 


.0468 


.6057 


.8121 


1.6886 


.9505 


1.70 


.0940 


+ .0175 


.5632 


.8870 


1.3079 


.9554 


1.75 


.0863 


- .0094 


.5173 


.9431 


.9363 


.9599 


1.80 


.0790 


.0341 


.4692 


.9809 


.5801 


.9641 


1.85 


.0720 


.0563 


.4195 


1.0014 


+ .2450 


.9678 


1.90 


.0656 


.0760 


.3693 


1.00.58 


- .0646 


.9713 


1.95 


.0596 


.0933 


.3192 


.9955 


.3452 


.9744 



PROBABILITY FUNCTION AND ITS DERIVATIVES 281 



z 


fo(z) 


fi{z) 


^4(2) 


^6(z) 




2.00 


.0540 


-0.1080 


-0.2700 


+0.9718 


-0.5939 


.9772 


2.05 


.0488 


.1203 


.2223 


.9366 


.8091 


.9798 


2.10 


.0440 


.1302 


.1765 


.8915 


.9899 


.9821 


2.15 


.0396 


.1380 


.1332 


.8382 


1.1362 


.9842 


2.20 


.0355 


.1436 


.0927 


.7784 


1.2488 


.9861 


2.25 


.0317 


.1473 


.0554 


.7139 


1.3291 


.9878 


2.30 


.0283 


.1492 


- .0214 


.6460 


1.3788 


.9893 


2.35 


.0252 


.1495 


+ .0092 


.5764 


1.4004 


.9906 


2.40 


.0224 


.1483 


.0362 


.5064 


1.3965 


.9918 


2.45 


.0198 


.1459 


.0598 


.4372 


1.3701 


.9929 


2.50 


.0175 


.1424 


.0800 


.3697 


1.3242 


.9938 


2.55 


.0154 


.1380 


.0968 


.3050 


1.2619 


.9946 


2.60 


.0136 


.1328 


.1105 


.2438 


1.1865 


.9953 


2.65 


.0119 


.1270 


.1213 


.1865 


1.1007 


.9960 


2.70 


.0104 


.1207 


.1293 


.1338 


1.0076 


.9965 


2.75 


.0091 


.1141 


.1347 


.0858 


.9098 


.9970 


2.80 


.0079 


.1073 


.1379 


.0429 


.8097 


.9974 


2.85 


.0069 


.1003 


.1391 


+ .0049 


.7095 


.9978 


2.90 


.0060 


.0934 


.1385 


- .0281 


.6110 


.9981 


2.95 


.0051 


.0865 


.1364 


.0563 


.5159 


.9984 


3.00 


.0044 


.0798 


.1330 


.0798 


.4255 


.9987 


3.05 


.0038 


.0732 


.1284 


.0989 


.3407 


.9989 


3.10 


.0033 


.0669 


.1231 


.1140 


.2624 


.9990 


3.15 


.0028 


.0609 


.1171 


.1253 


.1911 


-.9992 


3.20 


.0024 


.0552 


.1106 


.1332 


.1271 


.9993 


3.25 


.0020 


.0499 


.1039 


.1381 


.0705 


.9994 


3.30 


.0017 


.0449 


.0969 


.1404 


- .0213 


.9995 


3.35 


.0015 


.0402 


.0899 


.1403 


+ .0207 


.9996 


3.40 


.0012 


.0359 


.0829 


.1384 


.0561 


.9997 


3.45 


.0010 


.0318 


.0761 


.1348 


.0849 


.9997 


3.50 


.0009 


.0283 


.0694 


.1300 


.1078 


.9998 


3.55 


.0007 


.0249 


.0631 


.1242 


.1254 


.9998 


3.60 


.0006 


.0219 


.0570 


.1175 


.1380 


.9998 


3.65 


.0005 


.0193 


.0513 


.1104 


.1464 


.9999 


3.70 


.0004 


.0168 


.0460 


.1030 


.1510 


.9999 


3.75 


.0004 


.0146 


.0410 


.0954 


.1525 


.9999 


3.80 


.0003 


.0127 


.0365 


.0878 


.1512 


.9999 


3.85 


.0002 


.0110 


.0323 


.0803 


.1478 


.9999 


3.90 


.0002 


.0095 


.0284 


.0730 


.1426 


.99995 


3.95 


.0002 


.0081 


.0249 


.0660 


.1361 


.99997 



ADDENDA. 

APPENDIX AND BIBLIOGRAPHICAL NOTES. 

Chapter I. 

Page 3. The establishment of the relations between hypothetical judg- 
ments and probabilities is probably first due to F. C. Lange. See also the 
discussion in Sigw art's "Logic" (English translation, Macmillan Co., New 
York, 1904). A defense of the "principle of insufficient reason" as opposed 
to the view of von Kries is given by K. Sttjmpf ("tjber den Begriff der mate- 
matischen Wahrscheinlichkeit") Ber. bayr. Ak. {phil. Kl.), 1892. For a 
further discussion of the philosophical aspect the reader is advised to consult 
"Theorie und Methoden der Statistik" (Tubingen, 1913) by the Russian 
statistician, A. Kaufmann. 

Chapter II. 

Page 21. An interesting account of the application of the theory of proba- 
bilities to whist is given by Poole in "Philosophy of Whist Play" (New York 
and London, 1883). Page 23. Example 6. This is a, general ease of the 
so-called game of "Treize" or "Recontre" first discussed by Montm.ort in his 
"Essai sur les Jeux des Hazards" (1708). "Thirteen cards numbered 1, 2, 
3, ... up to 13 are thrown promiscuously into a bag and then drawn out 
singly; required the chance that once at least the number on a, card shall 
coincide with the number expressing the order in which it is drawn." This is 
one of the stock problems in probability and has been discussed by nearly all 
the leading classical writers on the subject. 

Chapter IV. 

The close connection between probability and symbolic logic is admirably 
discussed by the Italian mathematician, Peano, in various of his mathematical 
texts. Page 42. Example 19. See also the discussion by R. Henderson in 
"Mortality and Statistics" (New York, 1915). 

Chapter V. 

38. The moral expectation has been discussed by Harald Westbrgaard 
in "Tidsskrift for Matematik" (1878) and in " Smaaskrifter tilegnede C. F. 
Krieger" (Copenhagen, 1889). 

Chapter VI. 

A German translation with explanatory notes of Bayes's brochure has 
recently appeared in the series of "Ostwald's Klassiker." 

Page 74. The double integral in the numerator of (IV) is evidently of the 
form: 

r f PiVh yddijidyi, 

" 'J (A) 

2S3 
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where the contour of the field of integration (A) is defined by means of the 
relations : 

a < yiyi < 0, < j/i < 1 and < 2/2 < 1. 

The field of integration is thus the area swept out by the hyperbola 
yiVi = a, the straight line y^ = 1, the hyperbola y^yi = /3 and the straight 
line yi = 1. 

Changing the variables by means of the transformation: 



J/12/2 = y = v{y, z) and I — y 
we get the following new double integral 

ff FMy,z), ^{y, z)\\J\dydz 

'J •'(Ai) 

where / is the Jacobian or functional determinant defined by the formula: 



z(l -y) = Hy, z) 

(|J| taken as absolute value), 



For 



dtp dtp 
dy dz 

dy dz 



dtp dyp 
dy'dz 



dtp d\l/ 



2/2 



dz dy 



fiy, 2), 



|J| = 



1 - z(l - y) 
yi = \ - z{l - y) = <p(y, z), 
1 -z 2/(1 - y) 



[1 - z(l - y)¥ [1 - 2(1 - y)? 

z -a-y) 



1 - 



1 - 2(1 - y) 



The transformation in a double integral implies in general three parts 
(1) the expression of ^(2/12/2) in terms of y, z{2) the determination of the new 
system of limits (3) substitution of dyidyi. The solution of the third part we 
just gave above. The solution of the two first is purely algebraically. The 
first part is a straightforward simple problem which should present no difficulty 
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> See Ggctesat: " Mathematical Analysis " (New York, 1904) pages 266-67. 
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whatsoever to the student and which in conjunction with (3) brings the in- 
tegrands on the form given in formula (V). 

The easiest way to determine the new system of limits is probably by con- 
structing the contour in the new field of integration. The hyperbolas 
yyiji = a and y\jj2 = are in the new field of integration changed into the two 
straight lines y = a and j/ = /3 which determine the limits for the variable y. 
A mere inspection of the expressions for <p{y, z) and tl/{y, z) shows that the two 
straight lines y-i — 1 and j/i = 1 become in the new field z = 1 and z = 
which are the limits for z. 

The contour {A-Cj simply becomes a rectangle bounded by the straight 
lines = 0, 2/=/3, z = l and y = a. The complete transformation finally 
brings the numerator on the form as given in (V). 

Page 75. The question put by Mr. Bing is simply the determination of a 
future event by means of Bayes's Rule. The limits a and /3 become and 1 
respectively and the contour of the field of integration simply becomes the 
area bounded by j/12/2 = 0, 2/2 = 1, J/1J/2 = 1 and yi = 1, i. e., the area enclosed 
between the two axis, the line j/2 = 1, the hyperbola j/12/2 = 1 and the line 
j/i = 1. The transformed contour becomes a square with side equal to unity. 

Chapter VII. 

Page 83. The criticism by the English empiricists is to a certain extent 
due to a misconception of the BernouUian Theorem. "This theorem," Venn 
says, "is generally expressed somewhat as follows: That in the long run all 
events wUl tend to occur with a frequency proportional to their objective 
probabihties." Any one giving careful attention to the deduction of the famous 
theorem will, however, readily notice the fallacy of such a view. Not the 
actual absolute frequencies of the events but the mathematical expectations of 
such events are proportional to the a priori mathematical probability p. The 
fallacy of Mr. Venn hes in his confusing an actual event with its mathematical 
expectation. In other words, he makes the BemoulHan Theorem appear as 
a regular hypothetical judgment whereas as a matter of fact it is a simple 
probability judgment. If one is to take such an erroneous view of the Ber- 
nouUian Theorem one may even be reconciled with another startling statement 
by Venn that "If the chance (against the happening of a certain event) be 
1,023 to 1 it undoubtedly will happen once in 1,024 trials." 

For a clear presentment of the empirical methods and their relation to 
mathematical probabilities and deductive methods see v. Bortkiewicz 
"Kritische Betrachtungen zur theoretischen Statistik" (Jahrb. f. N.-Oe. u. 
Stat. 3 Folge, Ed. 8, 10, 11) and "Die statistischen Generalisationen " (Sd- 
entia, Vol. V). v. Bortkiewicz is but one of the brilliant school of Russian 
statisticians who has made a thorough study of the philosophical aspects of 
statistics. The induction method of J. S. Mill is carried much farther and 
put on a far sounder basis than that originally given by Mill in the brochure 
"Die Statistik als Wissenschaft " by A. A. Tschtjprgff as well as in his Russian 
text "Researches on the Theory of Statistics." The main ideas of the Russian 
writers are also found in KLaufmann's "Theorie und Methoden der Statistik" 
(Tubingen, 1913). 
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Chapter IX. 
Page 95. For a closer approximation of n! see Forsyth, A. R., "On an 
Approximate Expression for x\ " (Brit. Ass. Rep., 1883). Page 107. In this 
discussion it must be remembered that the variables are independent of each 
other. The formula: ((ka) = ki{a) is self evident, but may be proved as 
follows : 

eQca) = ksp = ke{a), e\ka - e(ka)f = k^e[a - e{a)f = /cV(a), or t{ha) = ke{a). 

Page 115. See also a similar discussion by Westergaard in " Mortalitat und 
MorbiUtat" (Jena, 1902), page 187. 

Chapter XL 

The still unfinished series of monographs by Charlier are found in various 
volumes of Meddelande fran Lands Astrnnomiska Ohservalorium (Lund, 
Stockholm) and in Svenska Aktuariejoreningehs Tidsskrifl (Stockholm). 

Page 137. Since all statistical characteristics to greater or less extent are 
effected with mean errors due to sampling it is of importance to be able to 
determine such mean errors in simple algebraic terms. We shall for the present 
confine ourselves to the mean and the dispersion. The mean error in the mean, 
Mb in a BernouUian Series is given by the formula: 



,(M) = ^ = -^^ = -^ . 

The mean error of the dispersion is somewhat difficult to obtain by elementary 
methods since it involves the determinat ion of the mean error of the mean error. 
The mean error square of the mean error square may be gotten by a process 
similar to that of Laplace in § 65-66 by the introduction of the parameter, t, 
in the expression for a' and a* in e[{a — sp)^ — spqf. After several reductions 
this latter, expression may be brought to the form: 2(spogo)^ = 2o-^ (approx.). 
For the dispersion we have: 

<2N 

This formula will be proven under the discussion of frequency curves. 

Chapter XIII. 

Page 184, The Danish engineer, Andrae, discovered a similar correlation 
formula about the same time as Bravais. 

Chapter XIV. 

Page 196. Viewed from the standpoint of elementary errors the expression 
for the frequency curve, (p(x), may be derived in the following fashion: — 
Let Ufi arrange the elementary errors in siriall equidistant groups or intervals 
of magnitude, a, and assume that all the elementary errors when situated in 
the same interval are of equal size, an assumption which is always permissible 
for small values of a. This means that when r is an integral positive number 
all the errors located in the interval ra — 3^a and ra + J^a must be of 
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equal size and equal to ra. The relative frequency or the probability of such 
errors is first of all proportional to the interval, a, and depends in the second 
instance upon a certain — so far unknown — function, f{ra), of the quantity 
ra. For a particular error source, say Qc, we may therefore express the 
probability of the occurrence of an error, ra, as 

afv(ra) 

where fv is the unknown function for which we make no other assumptions 
than those which follow immediately from the properties of mathematical 
probabilities, i. e. 

r = CO 

OSMra) S 1 and y> , Mra) = Hot v = 1, 2, 3, . . . s. 

7* =: CO 

Consider now for the moment the following expression 

j» = 00 

F„(co) = ^j a/„(ra)e''''™ where i = V— 1 



The coefficient of e''""' in the sum is evidently the probability for the occur- 
rence of an error ra from the error source Q,. 

The probability of the occurrence of an error ra from another error source, 
Qu, may similarly be expressed as 

J- ^ CO 



and so on for all the s independent error sources, which we assumed to be 
operative on our statistical object. 

The probability that the resulting sum from the various combinations into 
which the elementary errors from the s sources may enter is found by forming 
the product 

V = 8 

* M = II FvM = Fi(«) F2M Fzio,) . . . Fs{a) 

in accordance with the multiplication theorem of mathematical probabilities 
Writing the above products as 

* (o)) = a [<p (0) + (p (a) e"""' + (p (2a) e^""' + (f (3a) e^""' + . . . 

+ (p(-a) e-""' + (p (-2a) g-Sawi ^ ^ (_3^) e-^'"^+. . .] 

we notice that the coefficient, a(p{ra), of z?''"™ is the probability that the 
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elementary errors from the s error sources will enter into such a combination 
that their sum will fall between ra — J^a and ra + J^a. 

Multiplying on both sides of the above equation by e-''"™ (considering 
au as the independent variable) and integrating with respect to aw between 
the limits — x and + it we find that 

aip {ra) = ^ I ""* W e-''°'^'d (aco). 

In the above integral aw is the independent variable. If a is chosen as 
the independent variable, we have 

/+x/a 
*(w)6-'-«™'dw. 
— It/a 

If now we let r-a = X and let a converge towards zero, in which case 
u = dx, we evidently find that 

+ CO 



<p{x)dx = — / * (w) e-"-^ d< 

2tC ^J CO 



is the infinitely small probability that the sum of the elementary errors 
from the s sources will fall in the infinitely small interval x — }4<ix and 
X + yidx. 

It is evident that by introducing a new function if (w), defined by the rela- 
tion *(w) = '\/2xV'(w), the above equation reduces to (5b) on page 195 if 
we let (p(x) =f{x). 

Also by writing 

we obtain the general form of the frequency curve as shown on the bottom of 
page 196. 



Chapter XVI 

Page 237. The footnote mentions McAlister as one of the ear'iest investi- 
gators who used the geometric mean as the most probable value of the 
observations. I find, however, that McAlister's work was anteceded by that 
of Thiele and Gram. Thiele as early as 1866 used the geometric mean as the 
most probable value in a series of estimates of the distances of double stars, 
in a Danish monograph entitled Undersogelse af Omlobsbevcegehen i Dobbeltstjer- 
nesystemet Gamma Virginis. (Investigation on the movements of rotation in 
the double star system Gamma Vi ginis.) 



ADDENDA. 289 

Page 237. The integral in question may be evaluated as follows: — 

Let ( = (z — m) •.n, OT nt = z — m and ndt = dz. 
Hence the integral may be written as 



~|- OO -|~ CO t2 

n\/2 



vj kV2xJ 



Are(r+1)™ p nUr + l) -2 

e dt 



-Dm n 
■\/2i J ' 

— OD 

^ {r + l)m |^(r + l)! P - i [( - n(r + 



dt 



If we now let [( — n{r + 1)] = u, we have dt = du, and the last expression 
reduces to 

+ 00 

N 



V2 



J (r + Dm !L' (« + 1)! P ^ (r + l)m T (r + 1^) 

= e e2 I e ^ du = Ne e ^ , since the 

2ic t/ 



+ CO 



2 / — 

e du = •V2ii 



THE MATHEMATICAL THEORY OF PROBABILITIES 

ByARNEFISHEE 

ERRATA 

Page 232, line 2: 

For ki=rik^ read ki = riki^ 

Page 237, lines 16 and 17 to read: 

J + oo /•■» l/ iogx— m \2 

x'Fix)dx = {n^|2T)-^NJx'e ^V » -'da; 

-00 O 

J^oo _l/ g— m \2 

— 00 

on the assumption that z ot log x is normally distributed. 
Page 252, line 13: +48 to read -48. 
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elementary errors from the s error sources will enter into such a combination 
that their sum wUl faU between ra — J^a and ra. + }^a. 

Multiplying on both sides of the above equation by e~''°^'' (considering 
ao) as the independent variable) and integrating with respect to aw between 
the limits — x and + x we find that 

In the above integral aw is the independent variable. If w is chosen as 
the independent variable, we have 



2xi/ — Tc/a 



a<p {ra) = ^ I * (co) e-''"™ dw 



If now we let ra = x and let u converge towards zero, in which case 
= dx, we evidently find that 



2x ^ _ , 



ip (x) dx = I * (w) e 



is the infinitely small probability that the sum of the elementary errors 
from the s sources wiU fall in the infinitely small interval x — J^di and 
X + y^dx. 

It is evident that by introducing a new function ^ (u), defined by the rela- 
tion *(u) = \/2x '/'(w), the above equation reduces to (5b) on page 195 if 
we let <p{i) = f{x). 

Also by writing 

^^ ^j2 ^li " ^■••= V2x\J'(w) 

we obtain the general form of the frequency curve as shown on the bottom of 
page 196. 



Chapter XVI 

Page 237. The footnote mentions McAlister as one of the ear'iest investi- 
gators who used the geometric mean as the most probable value of the 
observations. I find, however, that MoAlister's work was anteceded by that 
of Thiele and Gram. Thiele as early as 1866 used the geometric mean as the 
most probable value in a series of estimates of the distances of double stars, 
in a Danish monograph entitled Undersogelse af Omlobsbevnegelsen i Dobbehstjer- 
nesysiemel Gamma Virginis. (Investigation on the movements of rotation in 
the double star system Gamma Vi ginis.) 
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Page 237. The integral in question may be evaluated as follows:— 
Let t = (z — m):n,ovnt = z — to and ndt = dz. 
Hence the integral may be written as 



+ 



e e dt 



fj (r + l)m |-(r + l)2 r _ i [( _ „(r + 1)]! 

CO 

If we now let [t — n{r + 1)] = u, we have dt = du, and the last expression 
reduces to 

N (r + l)m E (re + 1)2 



N fr + l)m "''(re + l)^ r _«= (r + l)m «'(r + 12) 

■^2^ ^ * f e 2 du = Ne e 2 , since the 

— 03 
I- CO 

e du = ■\/2% 



latter integral / 2 



THE MATHEMATICAL THEORY OF PROBABILITIES 

By ARNE FISHER 

ERRATA 

Page 232, line 2: 

For ki = rik^ read ki=riki^ 

Page 237, lines 16 and 17 to read: 

/-f-00 A*oo 1/ log X — m \2 

x'F{x)dx = {nyl2'ir)-^Njx'e ^^ " ' dx 

— 00 o 

/-|-oo —1/ 8 — m \l 

— 00 

on the assumption that z or log x is normally distributed. 
Page 262, line 13: +48 to read -48. 
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An Elementary Treatise on 
Frequency Curves 

And their Application to the Construction 
of Mortality Tables 

By ARNE FISHER 

English translation by E. A, VIGFUSSON 
With an Introduction by Professor RAYMOND PEARL 

Department of Biometry and Vital Statistics of 
the Johns Hopkins University 

(Pp. aas+xv) 

This book falls into two parts of which the first gives an elemen- 
tary presentation of the theory of frequency functions along similar 
lines as those developed by Mr. Fisher in his book on Probabilities. 
The second part, as pointed out in Mr. Vigfusson's preface, con- 
stitutes an entirely new departure in the analysis of mortality 
statistics. The author has set himself the difficult task to con- 
struct a complete mortality table from mortuary records by sex, 
attained age at death and causes of death, but without knowledge 
of the exposed to risk at various ages. The accomplishment of this 
problem has been made possible by means of a biological hypothesis 
and a proper classification of the causes of death upon biological 
principles. Once accepted the proposed hypothesis will make it 
possible to study the laws of human mortality in directions which 
hitherto have been regarded as impossible. Mr. Fisher has applied 
his new method to more than 25 population or occupational groups 
and gives in this book the detailed results of some of his investiga- 
tions in the way of 6 complete mortality tables for Michigan 
Males (1909-1915), Massachusetts Males (1914-1916), American 
Locomotive Engineers (1913-1917), American Coal Miners 
(1913-1917), Japanese Assured Males (1914-1917) and White 



Industrial Assured Males of the Metropolitan Life Insurance Co. 
(1911-1916). 

As a systematic treatise on frequency curves and their applica- 
tion to mortality studies this book should prove of great practical 
value not only to students of statistical methods, but to actu- 
aries, statisticians, health officers, biologists and students of 
general science as well. 

Comments of Specialists 

"Orthodoxy and discovery are as incompatible Intellectually as oil and water are 
physically, a cosmic law often overlooked by our "safe and sane" scientific gentry. 
This book is an outstanding feature that this law is still in operation. . . . 
It may fairly be regarded as fundamentally the most significant advance in actuarial 
theory since Halley. .... It opens out wonderful possibilities of research 

on the laws of mortality in directions which hitherto have been wholly impossible 
of attack. The criterion by which the significance of a new technique in any branch 
of science is evaluated, is just this at the degree to which it opens up new fields of 
research. By this criterion Fisher's work stands in a high and secure position.' 
{Extract from Professor Pearl's Introduction.) 



"Fisher's novel method has injected new blood in the old body of actuarial 
science." (C. Burrau.) 

"This new and novel idea meets in reality a very frequent need. It represents 
a supplement to the former tools of the actuary and makes possible the utilization 
of a statistical material, which according to the requirements of the older systems 
was considered as being of no value." 

(Extract from Forsikringstidende' s report of discussion in the Norwegian Ac- 
tuarial Society, June, 1920.) 

"Since particularly in industrial statistics, or in general statistical inquiries under 
war conditions, it is easier to obtain accurate data of deaths at ages than of exposed 
to risk the success of the method is encouraging. . . . The subject is one of 

peculiar interest at the present time." 

{Journal of the Royal Statistical Society, London, 1918.) 



From the theoretical point of view the method of Fisher is interesting. His 
proposal to decompose the mortality according to the different causes of death is 
entirely in conformity with the spirit of modern science which aims to analyze the 
phenomena by their differential parts. From the practical point of view the method 
is readily applied provided one has a double entry table of the mortuary records by 
age and cause of death. 

{Bulletin de V Association des Actuaires Suisses. The Journal of the Association 
of Svnss Actuaries, 1919.) 
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